Sixth European Conference on Speech Communication and Technology
Anovel multi-pass speech recognition method is presented. The method is organized as two stages. The first stage decodes the input speech based on an acoustic model and outputs the most probable sequence of basic units. The second stage searches for the most probable word sequence in the decoding output of the first stage. The novel point is use of an error correction model (ECM) in the second stage. With the ECM the second stage can recover decoding errors in the first stage. The ECM is realized as a statistical model, whose parameters are estimated from training data. The first stage is realized by a one-pass DP algorithm with triphone models. The second stage is realized by a best-first search algorithm with the ECM and a N-gram language model. The presented method was evaluated with large vocabulary continuous speech recognition. When we used N-best decoding outputs of the first stage and a 64K word trigram language model we achieved the word accuracy of 89.1% for open data with test-set perplexity of 129.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Abe, Yoshiharu / Itsui, Hiroyasu / Maruta, Yuzo / Nakajima, Kunio (1999): "A two-stage speech recognition method with an error correction model", In EUROSPEECH'99, 443-446.