ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Neural network acoustic models for the DARPA RATS program

Hagen Soltau, Hong-Kwang Kuo, Lidia Mangu, George Saon, Tomas Beran

We present a comparison of acoustic modeling techniques for the DARPA RATS program in the context of spoken term detection (STD) on speech data with severe channel distortions. Our main findings are that both Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) outperform Gaussian Mixture Models (GMMs) on a very difficult LVCSR task. We discuss pretraining, feature sets and training procedures, as well as weight sharing and shift invariance to increase robustness against channel distortions. We obtained about 20% error rate reduction over our state-of-the-art GMM system. Additionally, we found that CNNs work very well for spoken term detection, as a result of better lattice oracle rates compared to GMMs and MLPs.

doi: 10.21437/Interspeech.2013-674

Cite as: Soltau, H., Kuo, H.-K., Mangu, L., Saon, G., Beran, T. (2013) Neural network acoustic models for the DARPA RATS program. Proc. Interspeech 2013, 3092-3096, doi: 10.21437/Interspeech.2013-674

  author={Hagen Soltau and Hong-Kwang Kuo and Lidia Mangu and George Saon and Tomas Beran},
  title={{Neural network acoustic models for the DARPA RATS program}},
  booktitle={Proc. Interspeech 2013},