ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Automatic speech recognition of Cantonese-English code-mixing utterances

Joyce Y. C. Chan, P. C. Ching, Tan Lee, Houwei Cao

This paper describes our recent work on the development of a large vocabulary, speaker-independent, continuous speech recognition system for Cantonese-English code-mixing utterances. The details of both acoustic modeling and language modeling will be discussed. For acoustic modeling, Cantonese accents in English words are handled by applying cross-lingual acoustic units, as well as modifications in pronunciation dictionary. Statistic language models are built from a small amount of text data, as there are many limitations on data collection. Language boundary detection based on language identification algorithms is applied, and it offers a slight improvement to the overall accuracy. The recognition accuracy for Chinese characters and English lexicons in the code-mixing utterances is 56.37% and 52.99%, respectively.


doi: 10.21437/Interspeech.2006-29

Cite as: Chan, J.Y.C., Ching, P.C., Lee, T., Cao, H. (2006) Automatic speech recognition of Cantonese-English code-mixing utterances. Proc. Interspeech 2006, paper 1065-Mon1BuP.3, doi: 10.21437/Interspeech.2006-29

@inproceedings{chan06_interspeech,
  author={Joyce Y. C. Chan and P. C. Ching and Tan Lee and Houwei Cao},
  title={{Automatic speech recognition of Cantonese-English code-mixing utterances}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1065-Mon1BuP.3},
  doi={10.21437/Interspeech.2006-29}
}