Language Adaptive DNNs for Improved Low Resource Speech Recognition

Markus Müller, Sebastian Stüker, Alex Waibel

Deep Neural Network (DNN) acoustic models are commonly used in today’s state-of-the-art speech recognition systems. As neural networks are a data driven method, the amount of available training data directly impacts the performance. In the past, several studies have shown that multilingual training of DNNs leads to improvements, especially in resource constrained tasks in which only limited training data in the target language is available.

Previous studies have shown speaker adaptation to be successfully performed on DNNs. This is achieved by adding speaker information (e.g. i-Vectors) as additional input features. Based on the idea of adding additional features, we here present a method for adding language information to the input features of the network. Preliminary experiments have shown improvements by providing supervised information about language identity to the network.

In this work, we extended this approach by training a neural network to encode language specific features. We extracted those features unsupervised and used them to provide additional cues to the DNN acoustic model during training. Our results show that augmenting acoustic input features with this language code enabled the network to better capture language specific peculiarities. This improved the performance of systems trained using data from multiple languages.

DOI: 10.21437/Interspeech.2016-1143

Cite as

Müller, M., Stüker, S., Waibel, A. (2016) Language Adaptive DNNs for Improved Low Resource Speech Recognition. Proc. Interspeech 2016, 3878-3882.

author={Markus Müller and Sebastian Stüker and Alex Waibel},
title={Language Adaptive DNNs for Improved Low Resource Speech Recognition},
booktitle={Interspeech 2016},