Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Real-Time Multilingual HMM Training Robust to Channel Variations

Ea-Ee Jan, Jaime Botella Ordinas, George Saon, Salim Roukos

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

This paper describes our efforts towards real-time telephony multi-lingual Large Vocabulary Continuous Speech Recognition (LVCSR) system. The trilingual (English, French and Spanish) landline cellular hybrid systems is compared to each of our best monolingual systems. The results are very comparable. The degradation is approximately less than 10%. A HMM state quality measurement technique is explored to improve the performances on multilingual acoustic models. A pilot experiment on English/Spanish bilingual system demonstrates very good results. We achieved between 5% to 20% improvement on different test conditions. To further extend to speaker phone applications, we employed different front-end processing techniques, mainly CDCN prior to HDA and MLLT to reduce the error rate on the trilingual system by as many as 30%. These results suggest that trilingual acoustic models can be used for real telephony applications.


Full Paper

Bibliographic reference.  Jan, E. E. / Botella Ordinas, Jaime / Saon, George / Roukos, Salim (2000): "Real-time multilingual HMM training robust to channel variations", In ICSLP-2000, vol.3, 925-928.