12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

On-Line Language Model Biasing for Multi-Pass Automatic Speech Recognition

Sankaranarayanan Ananthakrishnan, Stavros Tsakalidis, Rohit Prasad, Premkumar Natarajan

Raytheon BBN Technologies, USA

a probability distribution over the hypothesis space. In typical use, the LM is trained off-line and remains static at run-time. While cache LMs, dialogue/style adaptation, and information retrieval-based biasing provide some ability for modifying the LM at run-time, they are limited in scope, susceptible to recognition error, place restrictions on the training data and/or test sets, or cannot be implemented for on-line, interactive systems. In this paper, we describe a novel LM biasing method suitable for multi-pass ASR systems. We use k-best lists from the initial recognition pass to obtain a confidence-weighted biasing of the LM training corpus. The latter is used to train a LM biased to the test input. The biased LM is used in the second pass to obtain refined hypotheses either by re-decoding or by re-ranking the k-best list. We sketch an on-line implementation of this scheme that lends itself to integration within low-latency systems. The proposed method is robust to recognition error, and operates on individual utterances without the need for dialogue context. The biased LMs provide significant reduction in perplexity and consistent improvement in word error rate (WER) over unbiased, state-of-the-art, large-vocabulary baseline ASR systems. On the Farsi and English test sets, we obtained relative reductions in perplexity of 24.5% and 31.6%, respectively. Additionally, relative reductions of 1.6% and 1.8% in WER were obtained for large-vocabulary Farsi and English ASR, respectively.

Full Paper

Bibliographic reference.  Ananthakrishnan, Sankaranarayanan / Tsakalidis, Stavros / Prasad, Rohit / Natarajan, Premkumar (2011): "On-line language model biasing for multi-pass automatic speech recognition", In INTERSPEECH-2011, 621-624.