In the submitted system to CHiME-5 challenge, we propose front-end enhancement of the beamformed array utterances to mitigate mismatch conditions between close-talking utterances and array utterances. Our initial experiments showed that an Acoustic Model trained by using only close-talking microphone utterances gave a superior performance than the baseline acoustic model when tested using close-talking utterances of the development set. Taking this cue, we explored the hypothesis that if array utterances are mapped to corresponding close-talking utterances, the system trained using only worn utterances will perform better. Towards this end, we trained a Time Delay Neural Network De-noising autoencoder (TDNN-DAE) using non-overlapping speech close-talking microphone utterances (targets) and their corresponding beamform utterances. However, the proposed system could not outperform the baseline.
Cite as: Joshi, S., Panda, A., Soni, M., Chakraborty, R., Kopparapu, S., Mohanan, N., Nayak, P., Velmurugan, R., Rao, P. (2018) CHiME 2018 Workshop: Enhancing beamformed audio using time delay neural network denoising autoencoder. Proc. 5th International Workshop on Speech Processing in Everyday Environments (CHiME 2018), 46-48, doi: 10.21437/CHiME.2018-10
@inproceedings{joshi18_chime, author={Sonal Joshi and Ashish Panda and Meet Soni and Rupayan Chakraborty and Sunilkumar Kopparapu and Nikhil Mohanan and Premanand Nayak and Rajbabu Velmurugan and Preeti Rao}, title={{CHiME 2018 Workshop: Enhancing beamformed audio using time delay neural network denoising autoencoder}}, year=2018, booktitle={Proc. 5th International Workshop on Speech Processing in Everyday Environments (CHiME 2018)}, pages={46--48}, doi={10.21437/CHiME.2018-10} }