8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Context Dependent Syllable Acoustic Model for Continuous Chinese Speech Recognition

Hao Wu, Xihong Wu

Peking University, China

The choice of basic modeling unit in building acoustic model for a continuous Mandarin speech recognition task is a very important issue [1]. Unlike traditional phoneme or Initial/Finals (IFs) units based acoustic modeling methods, which usually suffer from the limitations of less accuracy in modeling intra-syllable variations and long scale temporal dependencies, in this paper, a practicable syllable based approach is presented. In contrast with IFs, syllable can implicitly model the intra-syllable variations in good accuracy. Also, by carefully choosing context modeling schemes and parameter tying methods, syllable based acoustic model can capture longer temporal variations while keeping the complexity of model well controlled. Meanwhile, considering the data unbalanced problem, multiple sized unit model based approaches are also implemented in this research. The experiment result shows the acoustic model based on the presented syllable based approach is effective in improving the performance of the Chinese continuous speech recognition.

Full Paper

Bibliographic reference.  Wu, Hao / Wu, Xihong (2007): "Context dependent syllable acoustic model for continuous Chinese speech recognition", In INTERSPEECH-2007, 1713-1716.