We describe database and system development for speaker-independent recognition of telephone speech. The telephone speech database contains about 4,000 callers from the USA and Canada each of whom provided several utterances, including city names, first and last names, spelled names, and answers to yes/no questions. About 1,000 of the callers recited the English alphabet with pauses between letters. A portion of the database has been verified and phonetically labeled, and this portion was used to develop a baseline system that recognizes names spelled with pauses between letters. The system uses a neural network to segment speech into a sequence of 24 phonetic categories. The phonetic categories are used to hypothesize a sequence of letters which are then reclassified using a second neural network. First choice letter recognition accuracy was 87. 6% in the best condition. First choice name retrieval was S5. 5% for 200 spelled names retrieved from a database of 50,000 common last names.
Bibliographic reference. Cole, Ronald / Roginski, Krist / Fanty, Mark (1991): "English alphabet recognition with telephone speech", In EUROSPEECH-1991, 479-482.