The submitted system for CHiME-5 challenge focuses on implementing a better front-end for an automatic speech recognition (ASR) system trained on the data provided by CHiME-5. In this work, we focus on using non-negative matrix factorization (NMF) based technique to denoise and dereverberation. In Approach 1, the degraded single-channel speech utterances were enhanced using multi-channel Weighted prediction error (WPE) or NMF followed by a minimum variance distortionless response (MVDR) beamformer to obtain an enhanced signal. In Approach 2, we used multi-channel MVDR followed by a NMF based single-channel enhancement. Using the baseline acoustic model (AM), these enhanced speech utterances did not provide improved WER compared to the baseline Beamformit based system. So, we retrained the AM using WPE enhanced data for training (Approach 3). These approaches were able to improve the ASR results as compared to baseline. We are submitting results for the single-array track and only focus on acoustic robustness (i.e., ranking A).
Cite as: Mohanan, N., Nayak, P., Velmurugan, R., Rao, P., Joshi, S., Panda, A., Soni, M., Chakraborty, R., Kopparapu, S. (2018) NMF based front-end processing in multi-channel distant speech recognition. Proc. 5th International Workshop on Speech Processing in Everyday Environments (CHiME 2018), 70-75, doi: 10.21437/CHiME.2018-16
@inproceedings{mohanan18_chime, author={Nikhil Mohanan and Premanand Nayak and Rajbabu Velmurugan and Preeti Rao and Sonal Joshi and Ashish Panda and Meet Soni and Rupayan Chakraborty and Sunilkumar Kopparapu}, title={{NMF based front-end processing in multi-channel distant speech recognition}}, year=2018, booktitle={Proc. 5th International Workshop on Speech Processing in Everyday Environments (CHiME 2018)}, pages={70--75}, doi={10.21437/CHiME.2018-16} }