15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Improving Deep Neural Network Acoustic Modeling for Audio Corpus Indexing Under the IARPA Babel Program

Xiaodong Cui (1), Brian Kingsbury (1), Jia Cui (1), Bhuvana Ramabhadran (1), Andrew Rosenberg (2), Mohammad Sadegh Rasooli (3), Owen Rambow (3), Nizar Habash (3), Vaibhava Goel (1)

(1) IBM T.J. Watson Research Center, USA
(2) CUNY Queens College, USA
(3) Columbia University, USA

This paper is focused on several techniques that improve deep neural network (DNN) acoustic modeling for audio corpus indexing in the context of the IARPA Babel program. Specifically, fundamental frequency variation (FFV) and channel-aware (CA) features and data augmentation based on stochastic feature mapping (SFM) are investigated not only for improved automatic speech recognition (ASR) performance but also for their impact to the final spoken term detection on the pre-indexed audio corpus. Experimental results on development languages of Babel option period one show that the improved DNN acoustic models can reduce word error rates in ASR and also help the keyword search performance compared to already competitive DNN baseline systems.

Full Paper

Bibliographic reference.  Cui, Xiaodong / Kingsbury, Brian / Cui, Jia / Ramabhadran, Bhuvana / Rosenberg, Andrew / Rasooli, Mohammad Sadegh / Rambow, Owen / Habash, Nizar / Goel, Vaibhava (2014): "Improving deep neural network acoustic modeling for audio corpus indexing under the IARPA babel program", In INTERSPEECH-2014, 2103-2107.