Interspeech'2005 - Eurospeech
This paper describes the design and compilation of the CUMIX Cantonese-English code-mixing speech corpus. Code-mixing is a common phenomenon in many bilingual societies and it usually involves at least two different languages within one utterance. In Hong Kong, people usually mix English words and phrases with Cantonese in their daily conversation. Although there are many monolingual corpora of Cantonese and English, code-mixing speech database of these two languages is not available. The aim of developing this corpus is to study of the effect of Cantonese accents in English, the design of effective language boundary detection algorithm in code-mixing utterances , and evaluation of the performance of code-mixing speech recognizers.
Bibliographic reference. Chan, Joyce Y. C. / Ching, P. C. / Lee, Tan (2005): "Development of a Cantonese-English code-mixing speech corpus", In INTERSPEECH-2005, 1533-1536.