4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
In the paper, a two-level classification scheme applicable to practical discrete-utterance recognition systems is presented. Both the fast and fine match employ CDHMM whole-word models. The fast match is based on total data reduction, which includes both the minimalization of the acoustic data flow (the numbers of speech frames and features) and the reduction of the basic HMM parameters (the numbers of states and mixtures). The optimal choice of the fast match parameters is a subject of the procedure that aims at minimizing the total classification time while preserving the maximum available recognition accuracy. On a medium-size vocabulary task (121 city names) the fast match reduced recognition time to approx. 20% (compared with the original one-level system) with a negligible loss of accuracy. The time savings were even more considerable in case of a system with multi-mixture HMMs.
Bibliographic reference. Nouza, Jan (1996): "Discrete-utterance recognition with a fast match based on total data reduction", In ICSLP-1996, 2107-2110.