We describe a real-time speaker-independent French alphabet recognizer that performs with sufficient accuracy for commercial use. The system (a) digitizes a sequence of letters separated by brief pauses and computes a RASTA-PLP spectral representation, zero-crossing rate and peak-to-peak amplitudes of the waveform; (b) uses a neural network to assign 23 phonetic category labels to successive time frames; (c) performs an initial segmentation of the speech by mapping the phonetic label scores for each frame to pronunciation models for each letter using a modified Viterbi search; (d) performs a second classification of each hypothesized letter using the segment boundaries provided by the first-pass segmentation, producing a set of 26 letter scores plus a score for the category "Not-A-Letter"; and (e) uses the letter scores (plus the score for the category "Not-A-Letter") to identify the spelled word from a data base. The system has been evaluated on calls that were not used for training either network. The system achieved 84.4% first choice letter recognition accuracy on the test set. The system has also been evaluated on 84 spelled names from different callers where it achieved 92.8% correct recognition of the 84 spelled names contained in a database of 50,000 names. The final system has been optimized to run in real-time on a PC-board based on a single DSP TMS320C30. The two passes described above are performed in real-time by the DSP while the name search (up to 50,000 names) is performed (as letters are recognized) by the PC.
Bibliographic reference. Schmid, P. / Cole, Ronald / Fanty, M. / Bourlard, Hervé / Haessen, M. (1993): "Real-time, neural network-based, French alphabet recognition with telephone speech", In EUROSPEECH'93, 1723-1726.