ISCA Archive ICSLP 1994
ISCA Archive ICSLP 1994

State duration constraint using syllable duration for speech recognition

Yumi Wakita, Eiichi Tsuboka

For speech recognition using HMMs, we propose an adaptive syllable duration constraint method. The method constrains syllable durations using a relation each syllable included in the same utterance [1]. The duration of t-th syllable d(t) is predicted by using d0(l)... d0(t-1) the durations of syllables which have been recognized . After a syllable is recognized, if the durations of the t-th syllable is very different from the predicted value, the result is rejected. Advantages of this method are Its applicability is independent of the speed of speech. The durations of syllables within a sentence don't vary unnaturally. These qualities are not found in non-adaptive duration constraint method. We confirmed that this method improves recognition rate. If this syllable duration prediction (SDP) method can be used for constraining the duration of HMM states, the duration constraint can be integrated with matching and will bring SDP's improvements in recognition rate and computing time.

This paper proposes a new method of state duration constraint using SDP. At first the duration of s-th state of t-th syllable is predicted using the duration of t-th syllable which is predicted by SDP. Next the matching period of the state is constrained using the predicted state duration.

We evaluate this method using word and sentence recognition. For word recognition (100 words and 9 speakers, open test ), the error reduction is 14% and the matching speed is 25% shorter. For sentence recognition ( 50 sentences and 6 speakers, open test ), the error reduction is 46% and the matching speed is 50% shorter.


Cite as: Wakita, Y., Tsuboka, E. (1994) State duration constraint using syllable duration for speech recognition. Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994), 195-198

@inproceedings{wakita94_icslp,
  author={Yumi Wakita and Eiichi Tsuboka},
  title={{State duration constraint using syllable duration for speech recognition}},
  year=1994,
  booktitle={Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994)},
  pages={195--198}
}