![]() |
ASR2000 - Automatic Speech Recognition: Challenges for the new MilleniumSeptember 18-20, 2000 |
![]() |
Digit string recognition is required in many applications such as automatic banking system, database information retrieving system, etc. In order to design a high performance recognizer, duration information is explored in this study. In a Mandarin connected digit recognizer, insertion and deletion errors amount to more than two thirds of the total recognition errors because there exist two monophonemic digits and a heavily rhotacized vowel. A major weakness of conventional Hidden Markov Models (HMMs) is that they implicitly model state durations by a geometric distribution. In order to use duration information more efficiently, we propose a method to model context dependent word duration information and then incorporate it directly in the decoding algorithm. Experimental results show that this method reduces word error rate by as much as 32.1%.
Full Paper (PDF) Full Paper (Zipped Postscript)
Bibliographic reference. Peng, Gang / Zhang, Bo / Wang, William S-Y. (2000): "Performance of Mandarin connected digit recognizer with word duration modeling", In ASR-2000, 140-144.