EUROSPEECH 2003 - INTERSPEECH 2003
This paper proposes a new method of detecting mis-recognized utterances based on a ROVER-like voting scheme. Although the ROVER approach is effective in improving recognition accuracy, it has two serious problems from a practical point of view: 1) it is difficult to construct multiple automatic speech recognition (ASR) systems, 2) the computational cost increase according to the number of ASR systems. To overcome these problems, a new method is proposed where only a single acoustic engine is employed but multiple language models (LMs) consisting of a baseline (main) LM and sub LMs are used. The sub LMs are generated by clustered sentences and used to rescore the word lattice given by the main LM. As a result, the computational cost is greatly reduced. Through experiments, the proposed method resulted in 18-point higher precision with 10% loss of recall when compared with the baseline, and 22- point higher precision with 20% loss of recall.
Bibliographic reference. Fujinaga, Katsuhisa / Kokubo, Hiroaki / Yamamoto, Hirofumi / Kikui, Genichiro / Shimodaira, Hiroshi (2003): "Mis-recognized utterance detection using multiple language models generated by clustered sentences", In EUROSPEECH-2003, 2709-2712.