Third International Conference on Spoken Language Processing (ICSLP 94)
A new clustering scheme was proposed for the improvement of HMM-based phoneme recognition with temporal modeling. A precise observation of the temporal correspondences between the training data and their corresponding phoneme HMMs indicated that there were two extreme cases, one with several types of correspondences in a phoneme group that were completely different one from another, and the other with only one type. Although temporal modeling technique was commonly used to incorporate the temporal information in the HMMs, good modeling was not obtained for the former case. To cope with this problem, a new scheme was proposed where the training data for the phoneme of the former case were clustered into several smaller groups. The clustering was conducted so as to reduce the variation in the temporal correspondence in a group. After the clustering, a new HMM was constructed for each divided group. Using the proposed method, speaker dependent recognition experiments were conducted for the phonemes segmented from isolated words. A few-percent increase was observed in the recognition rate, indicating the validity of the proposed method.
Bibliographic reference. Minematsu, Nobuaki / Hirose, Keikichi (1994): "Speech recognition using HMM with decreased intra-group variation in the temporal structure", In ICSLP-1994, 187-190.