Interspeech'2005 - Eurospeech
Tone has a crucial role in Mandarin speech in distinguishing ambiguous words. In most state-of-the-art Mandarin automatic speech recognition systems, tonal acoustic units are used and F0 features are appended to the spectral features (MFCC/PLP). However, a tone depends on the F0 contour of a time span much longer than a frame. Ideally, systems would compute the frame-level likelihood of a tone using more than the F0 and derivative values at the current frame. Inspired by the tandem approach, we propose to extract tone-related features for each frame by using longer acoustic context information in a multi-layer perceptron (MLP). The extracted tone-related posteriors are then appended to the spectral feature vector to form a new feature vector for back-end HMM systems. Results show that significant improvement can be achieved by adding these tone-related MLP posterior features in a Mandarin conversational telephone speech recognition task. In one configuration, the character error rate was reduced from 35.7% to 33.2%.
Bibliographic reference. Lei, Xin / Hwang, Mei-Yuh / Ostendorf, Mari (2005): "Incorporating tone-related MLP posteriors in the feature representation for Mandarin ASR", In INTERSPEECH-2005, 2981-2984.