11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Mandarin Tone Recognition Using Affine-Invariant Prosodic Features and Tone Posteriorgram

Yow-Bang Wang (1), Lin-shan Lee (2)

(1) Academia Sinica, Taiwan
(2) National Taiwan University, Taiwan

Many recent studies about tone recognition have focused on model-level issues, either for tone and prosody labeling or LVCSR. This paper, as a contrast, focus on feature-level issues. We propose to use both syllable-level mean and utterance-level standard deviation for pitch feature normalization, instead of the common approach that uses utterance-level mean only. We show its robustness with both affine-invariance property and experiment result. Also, we incorporate tone posteriorgrams in second-pass tone recognition, which further improves tone recognition accuracy.

Full Paper

Bibliographic reference.  Wang, Yow-Bang / Lee, Lin-shan (2010): "Mandarin tone recognition using affine-invariant prosodic features and tone posteriorgram", In INTERSPEECH-2010, 2850-2853.