Third International Conference on Spoken Language Processing (ICSLP 94)
HMM-based tone recognition methods were developed for monosyllabic and disyllabic speech of standard Chinese. Two dimensional feature vectors were used for these methods to represent well both macroscopic and microscopic features of fundamental frequency contours well. In order to realize a function of speaker normalization in the methods, an offset was introduced to the fundamental frequency. It was shown experimentally that the best function was obtained when mean fundamental frequency averaged over several word utterances of the speaker being used. The words should include those of every tone types equally. As for the disyllabic tone recognition, the developed method does not require segmentation process into syllables. Besides four lexical tone models, two models were added to represent the half 3rd tone and the first-syllabic 4th tone in 4th tone plus 4th tone sequence. The unvoiced region of a disyllable usually corresponds to the initial consonant of the second syllable. A model was also assigned to this region to reduce the coarticulation effect between two syllables. As for the neutralized tone, it was included in the 4th tone group tentatively, and, after the HMM-based recognition process, it was separated depending on the durational difference. With the developed methods above, correct recognition rate of 98.5% was achieved for mono-syllables of multiple speakers, and, for disyllables of a speaker, 94.5% was obtained.
Bibliographic reference. Hu, Xinhui / Hirose, Keikichi (1994): "Recognition of Chinese tones in monosyllabic and disyllabic speech using HMM", In ICSLP-1994, 203-206.