Third International Conference on Spoken Language Processing (ICSLP 94)
An architecture for speech recognition is proposed, based on four stages: (1) recognition of the most likely phone sequence using centisecond Hidden Markov Models (HMMs); (2) phone-based lexical and syntactical forward decoding; (3) A*phone-based backward pass, producing a Word Hypothesis Structure (WHS); (4) accurate rescoring of the search sub-space represented by the WHS using centisecond HMMs. Experiments carried out on two different tasks show that a recognizer based on the proposed four-stage architecture is able to achieve comparable performance respect to a classic one-stage recognizer. Experimental results show also that the same recognition performance can be obtained with WHSs built with this approach and WHSs built using centisecond HMMs with a potential speed-up, in WHS generation, proportional to the average phoneme duration in centiseconds.
Bibliographic reference. Mori, Renato De / Giuliani, Diego / Gretter, Roberto (1994): "Phone-based prefiltering for continuous speech recognition", In ICSLP-1994, 2203-2206.