Sixth International Conference on Spoken Language Processing
In this paper, we address the problem of high performance speaker-independent continuous Mandarin digital string recognizer and focus on exploiting context information and prosody knowledge. Data-driven decision tree method to train tri-phone acoustic model was proposed. According to Chinese language property, digital specific question set was designed and the derived tri-phone model is more accurate to describe acoustic observation. For prosody cue, a novel Gaussian Mixture Density Duration Model (GMDDM) was presented. Unlike traditional normalizing or single parameter strategy, proposed duration model is context independent. The context variation is naturally embodied into multiple Gaussian distribution mixture. The number of mixture is automatically selected according maximum likelihood criteria. This simple but effective duration modelís likelihood score is combined with acoustic score as heuristic information for the backward A* decoding of word graph. Experimental results show the tri-phone acoustic model could lead to average 12.9% reduce of string error rate. When GMDDM model is applied, the string error rate is further reduced by 22.7%, which demonstrates the very usefulness of GMDDM model.
Bibliographic reference. Deng, Yonggang / Huang, Taiyi / Xu, Bo (2000): "Towards high performance continuous Mandarin digit string recognition", In ICSLP-2000, vol.3, 642-645.