ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Improved large vocabulary Mandarin speech recognition by selectively using tone information with a two-stage prosodic model

Li-Wei Cheng, Lin-shan Lee

The incorporation of prosodic information in large vocabulary continuous speech recognition has attracted much attention in recent years, especially for a tonal language such as Mandarin Chinese. The tones of some syllables are very difficult to recognize correctly due to the very complicated prosodic behavior. Tone recognition errors inevitably degrade the recognition accuracy seriously. We propose a new approach by introducing an extra tone category of "unknown." When the tone is difficult to recognize, the tone information will not be used. A two-stage prosodic model is developed for such a propose, and a 17.8% reduction in character error rate was achieved. Notably, this approach does not require speaker normalization for prosodic features.


doi: 10.21437/Interspeech.2008-346

Cite as: Cheng, L.-W., Lee, L.-s. (2008) Improved large vocabulary Mandarin speech recognition by selectively using tone information with a two-stage prosodic model. Proc. Interspeech 2008, 1137-1140, doi: 10.21437/Interspeech.2008-346

@inproceedings{cheng08b_interspeech,
  author={Li-Wei Cheng and Lin-shan Lee},
  title={{Improved large vocabulary Mandarin speech recognition by selectively using tone information with a two-stage prosodic model}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={1137--1140},
  doi={10.21437/Interspeech.2008-346}
}