8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Mis-Recognized Utterance Detection Using Multiple Language Models Generated by Clustered Sentences

Katsuhisa Fujinaga (1), Hiroaki Kokubo (2), Hirofumi Yamamoto (2), Genichiro Kikui (2), Hiroshi Shimodaira (1)

(1) JAIST, Japan
(2) ATR-SLT, Japan

This paper proposes a new method of detecting mis-recognized utterances based on a ROVER-like voting scheme. Although the ROVER approach is effective in improving recognition accuracy, it has two serious problems from a practical point of view: 1) it is difficult to construct multiple automatic speech recognition (ASR) systems, 2) the computational cost increase according to the number of ASR systems. To overcome these problems, a new method is proposed where only a single acoustic engine is employed but multiple language models (LMs) consisting of a baseline (main) LM and sub LMs are used. The sub LMs are generated by clustered sentences and used to rescore the word lattice given by the main LM. As a result, the computational cost is greatly reduced. Through experiments, the proposed method resulted in 18-point higher precision with 10% loss of recall when compared with the baseline, and 22- point higher precision with 20% loss of recall.

Full Paper

Bibliographic reference.  Fujinaga, Katsuhisa / Kokubo, Hiroaki / Yamamoto, Hirofumi / Kikui, Genichiro / Shimodaira, Hiroshi (2003): "Mis-recognized utterance detection using multiple language models generated by clustered sentences", In EUROSPEECH-2003, 2709-2712.