DNN-Based Feature Enhancement Using Joint Training Framework for Robust Multichannel Speech Recognition

Kang Hyun Lee, Tae Gyoon Kang, Woo Hyun Kang, Nam Soo Kim


Ever since the deep neural network (DNN) appeared in the speech signal processing society, the recognition performance of automatic speech recognition (ASR) has been greatly improved. Due to this achievement, the demands on various applications in distant-talking environment also have been increased. However, ASR performance in such environments is still far from that in close-talking environments due to various problems. In this paper, we propose a novel multichannel-based feature mapping technique combining conventional beamformer, DNN and its joint training scheme. Through the experiments using multichannel wall street journal audio visual (MC-WSJ-AV) corpus, it has been shown that the proposed technique models the complicated relationship between the array inputs and clean speech features effectively via employing intermediate target. The proposed method outperformed the conventional DNN system.


DOI: 10.21437/Interspeech.2016-105

Cite as

Lee, K.H., Kang, T.G., Kang, W.H., Kim, N.S. (2016) DNN-Based Feature Enhancement Using Joint Training Framework for Robust Multichannel Speech Recognition. Proc. Interspeech 2016, 3027-3031.

Bibtex
@inproceedings{Lee+2016,
author={Kang Hyun Lee and Tae Gyoon Kang and Woo Hyun Kang and Nam Soo Kim},
title={DNN-Based Feature Enhancement Using Joint Training Framework for Robust Multichannel Speech Recognition},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-105},
url={http://dx.doi.org/10.21437/Interspeech.2016-105},
pages={3027--3031}
}