Third European Conference on Speech Communication and Technology

Berlin, Germany
September 22-25, 1993


Real-Time, Neural Network-Based, French Alphabet Recognition With Telephone Speech

P. Schmid (1), Ronald Cole (1), M. Fanty (1), Hervé Bourlard (2), M. Haessen (2)

(1) Center for Spoken Language Understanding, Oregon Graduate Institute, Portland, OR, USA
(2) Lernout &: Hauspie Speech Products, Ieper, Belgium

We describe a real-time speaker-independent French alphabet recognizer that performs with sufficient accuracy for commercial use. The system (a) digitizes a sequence of letters separated by brief pauses and computes a RASTA-PLP spectral representation, zero-crossing rate and peak-to-peak amplitudes of the waveform; (b) uses a neural network to assign 23 phonetic category labels to successive time frames; (c) performs an initial segmentation of the speech by mapping the phonetic label scores for each frame to pronunciation models for each letter using a modified Viterbi search; (d) performs a second classification of each hypothesized letter using the segment boundaries provided by the first-pass segmentation, producing a set of 26 letter scores plus a score for the category "Not-A-Letter"; and (e) uses the letter scores (plus the score for the category "Not-A-Letter") to identify the spelled word from a data base. The system has been evaluated on calls that were not used for training either network. The system achieved 84.4% first choice letter recognition accuracy on the test set. The system has also been evaluated on 84 spelled names from different callers where it achieved 92.8% correct recognition of the 84 spelled names contained in a database of 50,000 names. The final system has been optimized to run in real-time on a PC-board based on a single DSP TMS320C30. The two passes described above are performed in real-time by the DSP while the name search (up to 50,000 names) is performed (as letters are recognized) by the PC.

