There are two types of methods for tone recognition of continuous speech: one-step and two-step approaches. Two-step approaches need to identify the syllable boundaries firstly, while one-step approaches do not. Previous studies mostly focus on two-step approaches. In this paper, a one-step approach using Multi-space distribution HMM (MSD-HMM) is investigated. The F0, which only exists in voiced speech, is modeled by MSD-HMM. Then, a tonal syllable network is built based on the reference and Viterbi search is carried out on it to find the best tone sequence. Two modifications to the conventional tri-phone HMM models are investigated: tone-based context expansion and syllable-based model units. The experimental results proved that tone-based context information is more important for tone recognition and syllable-based HMM models are much better than phone-based ones. The final tone correct rate result is 88.8%, which is much higher than the state-of-the-art two-step approaches.
Bibliographic reference. Liu, Changliang / Ge, Fengpei / Pan, Fuping / Dong, Bin / Yan, Yonghong (2009): "A one-step tone recognition approach using MSD-HMM for continuous speech", In INTERSPEECH-2009, 3015-3018.