15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Task-Aware Deep Bottleneck Features for Spoken Language Identification

Bing Jiang (1), Yan Song (1), Si Wei (2), Ian Vince McLoughlin (1), Li-Rong Dai (1)

(1) USTC, China
(2) Anhui USTC iFlytek, China

Recently, deep bottleneck features (DBF) extracted from a deep neural network (DNN) containing a narrow bottleneck layer, have been applied for language identification (LID), and yield significant performance improvement over state-of-the-art methods on NIST LRE 2009. However, the DNN is trained using a large corpus of specific language which is not directly related to the LID task. More recently, lattice based discriminative training methods for extracting more targeted DBF were proposed for ASR. Inspired by this, this paper proposes to tune the post-trained DNN parameters using an LID-specific training corpus, which may make the resulting DBF, termed a Discriminative DBF (D2BF), more discriminative and task-aware. Specifically, the maximum mutual information (MMI) criterion, with gradient descent, is applied to update the DNN parameters of the bottleneck layer in an iterative fashion. We evaluate the performance of the proposed D2BF using different back-end models, including GMM-MMI and ivector, over the most confused 6-languages selected from NIST LRE 2009. The results show that the proposed D2BF is more appropriate and effective than the original DBF.

Full Paper

Bibliographic reference.  Jiang, Bing / Song, Yan / Wei, Si / McLoughlin, Ian Vince / Dai, Li-Rong (2014): "Task-aware deep bottleneck features for spoken language identification", In INTERSPEECH-2014, 3012-3016.