We present an effective method to merge the acoustic units between Chinese and English to develop a language-independent speech recognition system. Chinese as a tonal language has large differences from English. An optimal Chinese phoneme inventory is set up in order to keep consistent with the representation of English acoustic units. Two different approaches for Chinese-English bilingual phoneme modeling are illustrated and compared. One is to combine the Chinese and English phonemes together based on International Phonetic Association (IPA). The other is a data-driven method on the basis of the confusion matrix. Experimental results show that all these methods are feasible and the data-driven method reduced the WER by 0.73% in Chinese and 3.76% in English relatively compared to the IPA-based method. As a by-product, the idea of data sharing across languages can obtain relative 8.7% error reduction under noise condition.
Cite as: Yang, L., Zhang, J., Yan, Y. (2007) Acoustic units selection in Chinese-English bilingual speech recognition. Proc. ITRW on Nonlinear Speech Processing (NOLISP 2007), 96-99
@inproceedings{yang07_nolisp, author={Lin Yang and Jianping Zhang and Yonghong Yan}, title={{Acoustic units selection in Chinese-English bilingual speech recognition}}, year=2007, booktitle={Proc. ITRW on Nonlinear Speech Processing (NOLISP 2007)}, pages={96--99} }