In this paper we summarize the recent improvements in recognition accuracy for speaker-dependent connected-digit recognition in a noisy car environment. We carried out experiments on a database recorded in a driving car with one part of the data uttered via a telephone handset and the other part in hands-free mode. Compared to the original system the error rate averaged over 10 speakers could be reduced from 3% to 1% in the handset mode and from 10% to 2% in the hands-free mode. This was mainly achieved by embedded training with an improved initialization and by incorporating dynamic information in the feature vector. To improve the robustness of the recognizer we incorporated a spectrum normalization technique, which tries to reduce the influence of acoustic channel variations and of additive noise on the computed features. On the "cross-tests" (training: handset, recognition: hands-free) this technique outperformed high-pass filtering of the subband envelopes which was recently proposed to improve system robustness.
Cite as: Geller, D., Haeb-Umbach, R., Ney, H. (1992) Improvements in speech recognition for voice dialing in the car environment. Proc. ETRW on Speech Processing in Adverse Conditions, 203-206
@inproceedings{geller92_spac, author={D. Geller and Reinhold Haeb-Umbach and Hermann Ney}, title={{Improvements in speech recognition for voice dialing in the car environment}}, year=1992, booktitle={Proc. ETRW on Speech Processing in Adverse Conditions}, pages={203--206} }