The success of deep neural network (DNN) acoustic models is partly owed to large amounts of training data available for different applications. This work investigates ways to improve DNN acoustic models for Bluetooth narrowband mobile applications when relatively small amounts of in-domain training data are available. To address the challenge of limited in-domain data, we use cross-bandwidth and cross-lingual transfer learning methods to leverage knowledge from other domains with more training data (different bandwidth and/or languages). Specifically, narrowband DNNs in a target language are initialized using the weights of DNNs trained on bandlimited wide-band data in the same language or those trained on a different (resource-rich) language. We investigate multiple recipes involving such methods with different data resources. For all languages in our experiments, these recipes achieve up to 45% relative WER reduction, compared to training solely on the Bluetooth narrowband data in the target language. Furthermore, these recipes are very beneficial even when over two hundred hours of manually transcribed in-domain data is available, and we can achieve better accuracy than the baselines with as little as 20 hours of in-domain data.
Cite as: Zhuang, X., Ghoshal, A., Rosti, A.-V., Paulik, M., Liu, D. (2017) Improving DNN Bluetooth Narrowband Acoustic Models by Cross-Bandwidth and Cross-Lingual Initialization. Proc. Interspeech 2017, 2148-2152, doi: 10.21437/Interspeech.2017-1129
@inproceedings{zhuang17_interspeech, author={Xiaodan Zhuang and Arnab Ghoshal and Antti-Veikko Rosti and Matthias Paulik and Daben Liu}, title={{Improving DNN Bluetooth Narrowband Acoustic Models by Cross-Bandwidth and Cross-Lingual Initialization}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2148--2152}, doi={10.21437/Interspeech.2017-1129} }