Third International Conference on Spoken Language Processing (ICSLP 94)
This paper describes a spoken word recognition system is based on phoneme duration estimated from the speaking rate of an input speech. We found that the normalization of phoneme duration with the average vowel duration of input speech and with the average duration of each phoneme class was very effective to reduce the variation of phoneme duration. For the normalization, we propose the first-order linear regressive equation as a function of the average vowel duration for estimating the duration of each phoneme in input speech. We applied this method to isolated spoken word recognition. We prepared several kinds of equations by taking into account various phoneme contexts and then examined them by word recognition scores. The word recognition score was 97.3% for the 212 word vocabulary, using the equation based on the weighted sum of two estimates from the preceding and the following phoneme dependent estimation. The score increased by 1.6% comparing to that without the information of speaking rate.
Bibliographic reference. Osaka, Yukihiro / Makino, Shozo / Sone, Toshio (1994): "Spoken word recognition using phoneme duration information estimated from speaking rate of input speech", In ICSLP-1994, 191-194.