The issue of incorporating prosodic information into speech recognition processes has emerged in recent years. In this work we present a complete framework for Mandarin speech recognition with prosodic modeling considering two-level hierarchical prosodic information for Mandarin Chinese. We developed a GMM-based, a decision-tree-based, and a hybrid approach. The best improvements in character recognition accuracy were obtained by the decision-tree-based prosodic models. This approach does NOT require a training corpus labeled with prosodic features, and works reasonably for a large-scale multispeaker task.
Cite as: Huang, J.-T., Lee, L.-s. (2006) Prosodic modeling in large vocabulary Mandarin speech recognition. Proc. Interspeech 2006, paper 1546-Tue3A2O.5, doi: 10.21437/Interspeech.2006-373
@inproceedings{huang06b_interspeech, author={Jui-Ting Huang and Lin-shan Lee}, title={{Prosodic modeling in large vocabulary Mandarin speech recognition}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1546-Tue3A2O.5}, doi={10.21437/Interspeech.2006-373} }