Recently, the i-vector representation based on deep bottleneck network(DBN) pre-trained for automatic speech recognition has received significant interest for both speaker verification(SV) and language identification(LID). In a previous work, we presented a unified DBN based i-vector framework, referred to as DBN-pGMM i-vector [1]. In this paper, we replace the pGMM with a phonetic mixture of factor analyzers (pMFA), and propose a new DBN-pMFA i-vector. The DBN-pMFA ivector includes the following improvements on previous one. 1) a pMFA model is derived from the DBN, which can jointly perform feature dimension reduction and de-correlation in a single linear transformation. 2) a shifted DBF, termed SDBF, is proposed to exploit the temporal contextual information, and 3) a senone selection scheme is proposed to make the i-vector extraction more efficient. We evaluate the proposed DBNpMFA i-vector on the most confused six languages selected from NIST LRE 2009. The experimental results demonstrate that DBN-pMFA can consistently outperform the previous DBN based framework [1]. The computational complexity can be significantly reduced by applying a simple senone selection scheme.
Cite as: Song, Y., Cui, R., Ian, M., Dai, L. (2016) Improvements on Deep Bottleneck Network based I-Vector Representation for Spoken Language Identification. Proc. The Speaker and Language Recognition Workshop (Odyssey 2016), 140-145, doi: 10.21437/Odyssey.2016-20
@inproceedings{song16_odyssey, author={Yan Song and Ruilian Cui and Mcloughlin Ian and Lirong Dai}, title={{Improvements on Deep Bottleneck Network based I-Vector Representation for Spoken Language Identification}}, year=2016, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2016)}, pages={140--145}, doi={10.21437/Odyssey.2016-20} }