14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

A Lecture Transcription System Combining Neural Network Acoustic and Language Models

Peter Bell (1), Hitoshi Yamamoto (2), Pawel Swietojanski (1), Youzheng Wu (2), Fergus McInnes (1), Chiori Hori (2), Steve Renals (1)

(1) University of Edinburgh, UK
(2) NICT, Japan

This paper presents a new system for automatic transcription of lectures. The system combines a number of novel features, including deep neural network acoustic models using multi-level adaptive networks to incorporate out-of-domain information, and factored recurrent neural network language models. We demonstrate that the system achieves large improvements on the TED lecture transcription task from the 2012 IWSLT evaluation . our results are currently the best reported on this task, showing an relative WER reduction of more than 16% compared to the closest competing system from the evaluation.

Full Paper

Bibliographic reference.  Bell, Peter / Yamamoto, Hitoshi / Swietojanski, Pawel / Wu, Youzheng / McInnes, Fergus / Hori, Chiori / Renals, Steve (2013): "A lecture transcription system combining neural network acoustic and language models", In INTERSPEECH-2013, 3087-3091.