This paper presents a set of techniques that we used to develop the language identification (LID) system for the second phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state-of-the-art detection capabilities on audio from highly degraded radio communication channels. We report significant gains due to (a) improved speech activity detection, (b) special handling of training data so as to enhance performance on short duration audio samples, and (c) noise robust feature extraction and normalization methods, including the use of multi-layer perceptron (MLP) based phoneme posteriors. We show that on this type of noisy data, the above techniques provide on average a 27% relative improvement in equal error rate (EER) across several test duration conditions.
Bibliographic reference. Ma, Jeff / Zhang, Bing / Matsoukas, Spyros / Mallidi, Sri Harish / Li, Feipeng / Hermansky, Hynek (2013): "Improvements in language identification on the RATS noisy speech corpus", In INTERSPEECH-2013, 69-73.