ASR2000 - Automatic Speech Recognition: Challenges for the new Millenium

September 18-20, 2000
Paris, France

Performance of Mandarin Connected Digit Recognizer with Word Duration Modeling

Gang Peng, Bo Zhang, and William S-Y. Wang

Department of Electronic Engineering City University of Hong Kong, China

Digit string recognition is required in many applications such as automatic banking system, database information retrieving system, etc. In order to design a high performance recognizer, duration information is explored in this study. In a Mandarin connected digit recognizer, insertion and deletion errors amount to more than two thirds of the total recognition errors because there exist two monophonemic digits and a heavily rhotacized vowel. A major weakness of conventional Hidden Markov Models (HMMs) is that they implicitly model state durations by a geometric distribution. In order to use duration information more efficiently, we propose a method to model context dependent word duration information and then incorporate it directly in the decoding algorithm. Experimental results show that this method reduces word error rate by as much as 32.1%.


Full Paper (PDF)   Full Paper (Zipped Postscript)

Bibliographic reference.  Peng, Gang / Zhang, Bo / Wang, William S-Y. (2000): "Performance of Mandarin connected digit recognizer with word duration modeling", In ASR-2000, 140-144.