This paper is focused on the use of acoustic information from an existing source language (Cantonese) to implement speaker adaptation for a new target language (English). Speaker-independent (SI) model mapping between Cantonese and English is investigated at different levels of acoustic units. Phones, states, and Gaussian mixture components are used as the mapping units respectively. With the model mapping, cross-lingual speaker adaptation can be performed. The performance of the proposed cross-lingual speaker adaptation system is determined by two factors: model mapping effectiveness and speaker adaptation effectiveness. Experimental results show that the model mapping effectiveness increased with the refinement of mapping units, and the speaker adaptation effectiveness depends on the model mapping effectiveness. Mapping between Gaussian mixture components is proved effective for various speech recognition tasks. A relative error reduction of 10.12% on English words is achieved by using a small amount of (4 minutes) Cantonese adaptation data, compared with the SI English recognizer.
Bibliographic reference. Cao, Houwei / Lee, Tan / Ching, P. C. (2010): "Cross-lingual speaker adaptation via Gaussian component mapping", In INTERSPEECH-2010, 869-872.