ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Reverberant speech recognition based on denoising autoencoder

Takaaki Ishii, Hiroki Komiyama, Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa

Denoising autoencoder is applied to reverberant speech recognition as a noise robust front-end to reconstruct clean speech spectrum from noisy input. In order to capture context effects of speech sounds, a window of multiple short-windowed spectral frames are concatenated to form a single input vector. Additionally, a combination of short and long-term spectra is investigated to properly handle long impulse response of reverberation while keeping necessary time resolution for speech recognition. Experiments are performed using the CENSREC-4 dataset that is designed as an evaluation framework for distant-talking speech recognition. Experimental results show that the proposed denoising autoencoder based front-end using the short-windowed spectra gives better results than conventional methods. By combining the long-term spectra, further improvement is obtained. The recognition accuracy by the proposed method using the short and long-term spectra is 97.0% for the open condition test set of the dataset, whereas it is 87.8% when a multi-condition training based baseline is used. As a supplemental experiment, large vocabulary speech recognition is also performed and the effectiveness of the proposed method has been confirmed.


doi: 10.21437/Interspeech.2013-267

Cite as: Ishii, T., Komiyama, H., Shinozaki, T., Horiuchi, Y., Kuroiwa, S. (2013) Reverberant speech recognition based on denoising autoencoder. Proc. Interspeech 2013, 3512-3516, doi: 10.21437/Interspeech.2013-267

@inproceedings{ishii13_interspeech,
  author={Takaaki Ishii and Hiroki Komiyama and Takahiro Shinozaki and Yasuo Horiuchi and Shingo Kuroiwa},
  title={{Reverberant speech recognition based on denoising autoencoder}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={3512--3516},
  doi={10.21437/Interspeech.2013-267}
}