5th International Conference on Spoken Language Processing
In this paper we characterize the sensitivity of two speaker-dependent isolated word recognizers toward several kinds of variability and distortions; namely noise, channels, distance to microphone and target language. Both recognizers use a phoneme similarity acoustic front-end as a rich representation for speech from which reliable features are extracted. A cross-correlation test showed that a phoneme similarity front-end is more robust to variability and distortions (especially intra-speaker variability) than a LPC cepstral front-end. The first recognizer (Condor) uses a frame-based approach while the second (Pasha) uses the phoneme similarity information contained in a small number of speech segments. The two recognition methods are presented with a special emphasis on the robustness improvements and computational trade-offs that have been made. Experimental results are reported for car noise at different speeds, speakerphone versus handset input in an office environment and several target languages. Recognition accuracy greater than 94% was achieved in a car environment at 60 mph (Condor) and recognition accuracy greater than 95% was achieved for speakerphone input at a distance of 50 cm. in an office environment.
Bibliographic reference. Morin, Philippe / Applebaum, Ted H. / Boman, Robert / Zhao, Yi / Junqua, Jean-Claude (1998): "Robust and compact multilingual word recognizers using features extracted from a phoneme similarity front-end", In ICSLP-1998, paper 0402.