Automatic speech recognition in mobile devices has to cope with varying acoustical background noises in potentially low SNR situations. Its performance in car noise environments is of our particular interest. We put focus on noise reduction techniques as applicable for speech enhancement to ensure the accuracy of the speech recognition process. We report on word recognition rate as well as on word accuracy, the latter also being a performance measure in the absence of speech (i.e. only background noise) cases.
As a classical technique, we first investigate Wiener filtering using a voice-activity-driven noise power spectral density (psd) estimation. Then we perform a comparison with the more advanced recursive least-squares (RLS) weighting rule for speech enhancement, as well as with the use of minimum statistics as noise psd estimation. Mel based root-cepstral coefficients has been taken as an alternative to the conventional Mel-frequency cepstral coefficients (MFCCs). The a-priori SNR based Wiener filtering with the minimum statistics and Mel based root-cepstral coefficients achieves 33.16% relative improvement in word accuracy over the classical technique.
Cite as: Setiawan, P., Suhadi, S., Fingscheidt, T., Stan, S. (2005) Robust speech recognition for mobile devices in car noise. Proc. Interspeech 2005, 2673-2676, doi: 10.21437/Interspeech.2005-257
@inproceedings{setiawan05_interspeech, author={Panji Setiawan and Suhadi Suhadi and Tim Fingscheidt and Sorel Stan}, title={{Robust speech recognition for mobile devices in car noise}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={2673--2676}, doi={10.21437/Interspeech.2005-257} }