This paper presents a unified i-vector framework for language identification (LID) based on deep bottleneck networks (DBN) trained for automatic speech recognition (ASR). The framework covers both front-end feature extraction and back-end modeling stages.The output from different layers of a DBN are exploited to improve the effectiveness of the i-vector representation through incorporating a mixture of acoustic and phonetic information. Furthermore, a universal model is derived from the DBN with a LID corpus. This is a somewhat inverse process to the GMM-UBM method, in which the GMM of each language is mapped from a GMM-UBM. Evaluations on specific dialect recognition tasks show that the DBN based i-vector can achieve significant and consistent performance gains over conventional GMM-UBM and DNN based i-vector methods . The generalization capability of this framework is also evaluated using DBNs trained on Mandarin and English corpuses.
Bibliographic reference. Song, Yan / Hong, Xinhai / Jiang, Bing / Cui, Ruilian / McLoughlin, Ian / Dai, Li-Rong (2015): "Deep bottleneck network based i-vector representation for language identification", In INTERSPEECH-2015, 398-402.