ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

The IBM BOLT speech transcription system

Samuel Thomas, George Saon, Hong-Kwang J. Kuo, Lidia Mangu

We describe the IBM automatic speech recognition (ASR) system for the DARPA Broad Operational Language Translation (BOLT) program. The system is used to transcribe conversational telephone speech (CTS) prior to machine translation for Phase 3 of the program's Activity A. The ASR system is a combination of novel sequence trained ensemble deep neural network acoustic models on speaker adapted features and convolutional neural network models on two kinds of spectro-temporal representations of speech, in conjunction with a variety of class, neural network and n-gram based language models. Acoustic and language models for the recognition system are built on transcribed audio released under the program and further optimized for the final machine translation task as well. The evaluation system has a word error rate of 32.7% on a 2 hour Egyptian Arabic development set for this task.

doi: 10.21437/Interspeech.2015-634

Cite as: Thomas, S., Saon, G., Kuo, H.-K.J., Mangu, L. (2015) The IBM BOLT speech transcription system. Proc. Interspeech 2015, 3150-3153, doi: 10.21437/Interspeech.2015-634

  author={Samuel Thomas and George Saon and Hong-Kwang J. Kuo and Lidia Mangu},
  title={{The IBM BOLT speech transcription system}},
  booktitle={Proc. Interspeech 2015},