Improvements on Deep Bottleneck Network based I-Vector Representation for Spoken Language Identification

Yan Song, Ruilian Cui, Mcloughlin Ian, Lirong Dai


Recently, the i-vector representation based on deep bottleneck network(DBN) pre-trained for automatic speech recognition has received significant interest for both speaker verification(SV) and language identification(LID). In a previous work, we presented a unified DBN based i-vector framework, referred to as DBN-pGMM i-vector [1]. In this paper, we replace the pGMM with a phonetic mixture of factor analyzers (pMFA), and propose a new DBN-pMFA i-vector. The DBN-pMFA ivector includes the following improvements on previous one. 1) a pMFA model is derived from the DBN, which can jointly perform feature dimension reduction and de-correlation in a single linear transformation. 2) a shifted DBF, termed SDBF, is proposed to exploit the temporal contextual information, and 3) a senone selection scheme is proposed to make the i-vector extraction more efficient. We evaluate the proposed DBNpMFA i-vector on the most confused six languages selected from NIST LRE 2009. The experimental results demonstrate that DBN-pMFA can consistently outperform the previous DBN based framework [1]. The computational complexity can be significantly reduced by applying a simple senone selection scheme.


DOI: 10.21437/Odyssey.2016-20

Cite as

Song, Y., Cui, R., Ian, M., Dai, L. (2016) Improvements on Deep Bottleneck Network based I-Vector Representation for Spoken Language Identification. Proc. Odyssey 2016, 140-145.

Bibtex
@inproceedings{Song+2016,
author={Yan Song and Ruilian Cui and Mcloughlin Ian and Lirong Dai},
title={Improvements on Deep Bottleneck Network based I-Vector Representation for Spoken Language Identification},
year=2016,
booktitle={Odyssey 2016},
doi={10.21437/Odyssey.2016-20},
url={http://dx.doi.org/10.21437/Odyssey.2016-20},
pages={140--145}
}