EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Improved Context-Dependent Acoustic Modeling for Continuous Chinese Speech Recognition

Jiyong Zhang, Fang Zheng, Jing Li, Chunhua Luo, Guoliang Zhang

Tsinghua Univ., China

This paper describes the new framework of context-dependent (CD) Initial/Final (IF) acoustic modeling using the decision tree based state tying for continuous Chinese speech recognition. The Extended Initial/Final (XIF) set is chosen as the basic speech recognition unit (SRU) set according to the Chinese language characteristics, which outperforms the standard IF set. An adaptive mixture increasing strategy is applied when splitting the single Gaussian into mixed Gaussians in each tied state after the decision tree has been constructed. Our experimental results show that these two improvements are helpful to the acoustic modeling of Chinese speech recognition and that the CD XIF model outperforms the baseline syllable model over 30%.

Full Paper

Bibliographic reference.  Zhang, Jiyong / Zheng, Fang / Li, Jing / Luo, Chunhua / Zhang, Guoliang (2001): "Improved context-dependent acoustic modeling for continuous Chinese speech recognition", In EUROSPEECH-2001, 1617-1620.