16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Improving Deep Neural Networks Based Multi-Accent Mandarin Speech Recognition Using I-Vectors and Accent-Specific Top Layer

Mingming Chen (1), Zhanlei Yang (1), Jizhong Liang (2), Yanpeng Li (2), Wenju Liu (1)

(1) Chinese Academy of Sciences, China
(2) SGCC, China

In this paper, we propose a method that use i-vectors and model adaptation techniques to improve the performance of deep neural networks(DNNs) based multi-accent Mandarin speech recognition. I-vectors which are speaker-specific features have been proved to be effective when used in accent identification. They can be used in company with conventional spectral features as the input features of DNNs to improve the discrimination for different accents. Meanwhile, we adapt DNNs to different accents by using an accent-specific top layer and shared hidden layers. The accent-specific top layer is used to adapt to different accents while the share hidden layers which can be seen as feature extractors can extract discriminative high-level features between different accents. These two techniques are complementary and can be easily combined together. Our experiments on the 400-hours Intel Accented Mandarin Speech Recognition Corpus show that our proposed method can significantly improve the performance of DNNs-based accented Mandarin speech recognition.

Full Paper

Bibliographic reference.  Chen, Mingming / Yang, Zhanlei / Liang, Jizhong / Li, Yanpeng / Liu, Wenju (2015): "Improving deep neural networks based multi-accent Mandarin speech recognition using i-vectors and accent-specific top layer", In INTERSPEECH-2015, 3620-3624.