INTERSPEECH 2014
15th Annual Conference of the International Speech Communication Association

Singapore
September 14-18, 2014

Robust Language Identification Using Convolutional Neural Network Features

Sriram Ganapathy (1), Kyu Han (1), Samuel Thomas (1), Mohamed Omar (1), Maarten Van Segbroeck (2), Shrikanth S. Narayanan (2)

(1) IBM T.J. Watson Research Center, USA
(2) University of Southern California, USA

The language identification (LID) task in the Robust Automatic Transcription of Speech (RATS) program is challenging due to the noisy nature of the audio data collected over highly degraded radio communication channels as well as the use of short duration speech segments for testing. In this paper, we report the recent advances made in the RATS LID task by using bottleneck features from a convolutional neural network (CNN). The CNN, which is trained with labelled data from one of target languages, generates bottleneck features which are used in a Gaussian mixture model (GMM)-ivector LID system. The CNN bottleneck features provide substantial complimentary information to the conventional acoustic features even on languages not seen in its training. Using these bottleneck features in conjunction with acoustic features, we obtain significant improvements (average relative improvements of 25% in terms of equal error rate (EER) compared to the corresponding acoustic system) for the LID task. Furthermore, these improvements are consistent for various choices of acoustic features as well as speech segment durations.

Full Paper

Bibliographic reference.  Ganapathy, Sriram / Han, Kyu / Thomas, Samuel / Omar, Mohamed / Segbroeck, Maarten Van / Narayanan, Shrikanth S. (2014): "Robust language identification using convolutional neural network features", In INTERSPEECH-2014, 1846-1850.