11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Cross-Lingual Speaker Adaptation via Gaussian Component Mapping

Houwei Cao, Tan Lee, P. C. Ching

Chinese University of Hong Kong, China

This paper is focused on the use of acoustic information from an existing source language (Cantonese) to implement speaker adaptation for a new target language (English). Speaker-independent (SI) model mapping between Cantonese and English is investigated at different levels of acoustic units. Phones, states, and Gaussian mixture components are used as the mapping units respectively. With the model mapping, cross-lingual speaker adaptation can be performed. The performance of the proposed cross-lingual speaker adaptation system is determined by two factors: model mapping effectiveness and speaker adaptation effectiveness. Experimental results show that the model mapping effectiveness increased with the refinement of mapping units, and the speaker adaptation effectiveness depends on the model mapping effectiveness. Mapping between Gaussian mixture components is proved effective for various speech recognition tasks. A relative error reduction of 10.12% on English words is achieved by using a small amount of (4 minutes) Cantonese adaptation data, compared with the SI English recognizer.

Full Paper

Bibliographic reference.  Cao, Houwei / Lee, Tan / Ching, P. C. (2010): "Cross-lingual speaker adaptation via Gaussian component mapping", In INTERSPEECH-2010, 869-872.