10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Effects of Language Mixing for Automatic Recognition of Cantonese-English Code-Mixing Utterances

Houwei Cao, P. C. Ching, Tan Lee

Chinese University of Hong Kong, China

While automatic speech recognition of either Cantonese or English alone has achieved a great degree of success, recognition of Canton- English code-mixing speech is not as trivial. This paper attempts to analyze the effect of language mixing on recognition performance of code-mixing utterances. By examining the recognition results of Canton-English code-mixing speech, where Canton is the matrix language and English is the embedded language, we noticed that recognition accuracy of the embedded language plays a significant role to the overall performance. In particular, significant performance degradation is found in the matrix language if the embedded words can not be recognized correctly. We also studied the error propagation effect of the embedded English. The results show that the error in embedded English words may propagate to two neighboring Cantonese syllables. Finally, analysis is carried out to determine the influencing factors for recognition performance in embedded English.

Full Paper

Bibliographic reference.  Cao, Houwei / Ching, P. C. / Lee, Tan (2009): "Effects of language mixing for automatic recognition of Cantonese-English code-mixing utterances", In INTERSPEECH-2009, 3011-3014.