ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Incorporating tone-related MLP posteriors in the feature representation for Mandarin ASR

Xin Lei, Mei-Yuh Hwang, Mari Ostendorf

Tone has a crucial role in Mandarin speech in distinguishing ambiguous words. In most state-of-the-art Mandarin automatic speech recognition systems, tonal acoustic units are used and F0 features are appended to the spectral features (MFCC/PLP). However, a tone depends on the F0 contour of a time span much longer than a frame. Ideally, systems would compute the frame-level likelihood of a tone using more than the F0 and derivative values at the current frame. Inspired by the tandem approach, we propose to extract tone-related features for each frame by using longer acoustic context information in a multi-layer perceptron (MLP). The extracted tone-related posteriors are then appended to the spectral feature vector to form a new feature vector for back-end HMM systems. Results show that significant improvement can be achieved by adding these tone-related MLP posterior features in a Mandarin conversational telephone speech recognition task. In one configuration, the character error rate was reduced from 35.7% to 33.2%.


doi: 10.21437/Interspeech.2005-134

Cite as: Lei, X., Hwang, M.-Y., Ostendorf, M. (2005) Incorporating tone-related MLP posteriors in the feature representation for Mandarin ASR. Proc. Interspeech 2005, 2981-2984, doi: 10.21437/Interspeech.2005-134

@inproceedings{lei05_interspeech,
  author={Xin Lei and Mei-Yuh Hwang and Mari Ostendorf},
  title={{Incorporating tone-related MLP posteriors in the feature representation for Mandarin ASR}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={2981--2984},
  doi={10.21437/Interspeech.2005-134}
}