In the missing data approach to robust Automatic Speech Recognition (ASR), time-frequency regions which carry reliable speech information are identified. Recognition is then based on these regions alone. In this paper, we address the problem of identifying reliable regions and propose two criteria to solve this based on negative energy and SNR. These criteria are evaluated on the TIDigits corpus for several noise sources and compared with spectral subtraction. We show that in this task the missing data method performs considerably better than spectral subtraction and the combination of the two techniques outperforms either technique used alone. We report robust performance at 0dB SNR for car noise and 10dB SNR for factory noise.
Cite as: Vizinho, A., Green, P., Cooke, M., Josifovski, L. (1999) Missing data theory, spectral subtraction and signal-to-noise estimation for robust ASR: an integrated study. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2407-2410, doi: 10.21437/Eurospeech.1999-528
@inproceedings{vizinho99_eurospeech, author={Ascension Vizinho and Phil Green and M. Cooke and Ljubomir Josifovski}, title={{Missing data theory, spectral subtraction and signal-to-noise estimation for robust ASR: an integrated study}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={2407--2410}, doi={10.21437/Eurospeech.1999-528} }