Machine Listening in Multisource Environments (CHiME) 2011

Florence, Italy
September 1, 2011

Mask Estimation and Sparse Imputation for Missing Data Speech Recognition in Multisource Reverberant Environments

Heikki Kallasjoki (1), Sami Keronen (1), Guy J. Brown (2), Jort F. Gemmeke (3), Ulpu Remes (1), Kalle J. Palomäki (1)

(1) Department of Information and Computer Science, Aalto University School of Science, Finland
(2) Department of Computer Science, University of Sheffield, UK
(3) Department ESAT, Katholieke Universiteit Leuven, Belgium

This work presents an automatic speech recognition system which uses a missing data approach to compensate for environmental noise. The missing, noise-corrupted components are identified using binaural features or a support vector machine (SVM) classifier. To perform speech recognition using the partially observed data, the missing components are substituted with clean speech estimates calculated using sparse imputation. Evaluated on the CHiME reverberant multisource environment corpus, the missing data approach significantly improved the keyword recognition accuracy in moderate and poor SNR conditions. The best results were achieved when the missing components were identified using the binaural features and the clean speech estimates associated with observation uncertainty estimates.

Index Terms. noise robust, speech recognition, binaural, SVM, sparse imputation, observation uncertainties

Full Paper

Bibliographic reference.  Kallasjoki, Heikki / Keronen, Sami / Brown, Guy J. / Gemmeke, Jort F. / Remes, Ulpu / Palomäki, Kalle J. (2011): "Mask estimation and sparse imputation for missing data speech recognition in multisource reverberant environments", In CHiME-2011, 58-63.