7th International Conference on Spoken Language Processing
September 16-20, 2002
In this paper, we describe our duration model techniques in HMM based speech recognizer. With this approach, a large amount of deletion and insertion errors can be reduced in Mandarin continuous digits recognizer. We address a simple duration penalty function, which can be explicitly combined into Viterbi-Beam search with negligible incremental computation overload. Different parametric distributions are investigated to accurately approximate the syllable-level duration information. A relative Rate of Speech (ROS) based duration normalization scheme is proposed to eliminate variation caused by different speaking rate. In order to directly incorporate this normalization strategy, an online dynamic ROS estimation method is introduced into real-time recognition application. Experimental results demonstrated significant performance improvement has been achieved. The word error rate (WER) was reduced 52.1%, compared with our baseline recognition system.
Bibliographic reference. Dong, Rong / Zhu, Jie (2002): "On use of duration modeling for continuous digits speech recognition", In ICSLP-2002, 385-388.