Many recent studies about tone recognition have focused on model-level issues, either for tone and prosody labeling or LVCSR. This paper, as a contrast, focus on feature-level issues. We propose to use both syllable-level mean and utterance-level standard deviation for pitch feature normalization, instead of the common approach that uses utterance-level mean only. We show its robustness with both affine-invariance property and experiment result. Also, we incorporate tone posteriorgrams in second-pass tone recognition, which further improves tone recognition accuracy.
Bibliographic reference. Wang, Yow-Bang / Lee, Lin-shan (2010): "Mandarin tone recognition using affine-invariant prosodic features and tone posteriorgram", In INTERSPEECH-2010, 2850-2853.