Deep Neural Networks for i-Vector Language Identification of Short Utterances in Cars

Omid Ghahabi, Antonio Bonafonte, Javier Hernando, Asunción Moreno


This paper is focused on the application of the Language Identification (LID) technology for intelligent vehicles. We cope with short sentences or words spoken in moving cars in four languages: English, Spanish, German, and Finnish. As the response time of the LID system is crucial for user acceptance in this particular task, speech signals of different durations with total average of 3.8s are analyzed. In this paper, the authors propose the use of Deep Neural Networks (DNN) to model effectively the i-vector space of languages. Both raw i-vectors and session variability compensated i-vectors are evaluated as input vectors to DNNs. The performance of the proposed DNN architecture is compared with both conventional GMM-UBM and i-vector/LDA systems considering the effect of durations of signals. It is shown that the signals with durations between 2 and 3s meet the requirements of this application, i.e., high accuracy and fast decision, in which the proposed DNN architecture outperforms GMM-UBM and i-vector/LDA systems by 37% and 28%, respectively.


DOI: 10.21437/Interspeech.2016-1045

Cite as

Ghahabi, O., Bonafonte, A., Hernando, J., Moreno, A. (2016) Deep Neural Networks for i-Vector Language Identification of Short Utterances in Cars. Proc. Interspeech 2016, 367-371.

Bibtex
@inproceedings{Ghahabi+2016,
author={Omid Ghahabi and Antonio Bonafonte and Javier Hernando and Asunción Moreno},
title={Deep Neural Networks for i-Vector Language Identification of Short Utterances in Cars},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1045},
url={http://dx.doi.org/10.21437/Interspeech.2016-1045},
pages={367--371}
}