Data Augmentation Using Multi-Input Multi-Output Source Separation for Deep Neural Network Based Acoustic Modeling

Yusuke Fujita, Ryoich Takashima, Takeshi Homma, Masahito Togami


We investigate the use of local Gaussian modeling (LGM) based source separation to improve speech recognition accuracy. Previous studies have shown that the LGM based source separation technique has been successfully applied to the runtime speech enhancement and the speech enhancement of training data for deep neural network (DNN) based acoustic modeling. In this paper, we propose a data augmentation method utilizing the multi-input multi-output (MIMO) characteristic of LGM based source separation. We first investigate the difference between unprocessed multi-microphone signals and multi-channel output signals from LGM based source separation as augmented training data for DNN based acoustic modeling. Experimental results using the third CHiME challenge dataset show that the proposed data augmentation outperforms the conventional data augmentation. In addition, we experiment the beamforming applied to the source separated signals as runtime speech enhancement. The results show that the proposed runtime beamforming further improves the speech recognition accuracy.


DOI: 10.21437/Interspeech.2016-733

Cite as

Fujita, Y., Takashima, R., Homma, T., Togami, M. (2016) Data Augmentation Using Multi-Input Multi-Output Source Separation for Deep Neural Network Based Acoustic Modeling. Proc. Interspeech 2016, 3818-3822.

Bibtex
@inproceedings{Fujita+2016,
author={Yusuke Fujita and Ryoich Takashima and Takeshi Homma and Masahito Togami},
title={Data Augmentation Using Multi-Input Multi-Output Source Separation for Deep Neural Network Based Acoustic Modeling},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-733},
url={http://dx.doi.org/10.21437/Interspeech.2016-733},
pages={3818--3822}
}