ISCA Archive SLTU 2014
ISCA Archive SLTU 2014

Features for factored language models for code-Switching speech

Heike Adel, Katrin Kirchhoff, Dominic Telaar, Ngoc Thang Vu, Tim Schlippe, Tanja Schultz

This paper presents investigations of features which can be used to predict Code-Switching speech. For this task, factored language models are applied and implemented into a state-of-the-art decoder. Different possible factors, such as words, part-of-speech tags, Brown word clusters, open class words and open class word clusters are explored. We find that Brown word clusters, part-of-speech tags and open-class words are most effective at reducing the perplexity of factored language models on the Mandarin-English Code-Switching corpus SEAME. In decoding experiments, the model containing Brown word clusters and part-of-speech tags and the model also including open class word clusters yield the best mixed error rate results. In summary, the factored language models can reduce the perplexity on the SEAME evaluation set by up to 10.8% relative and the mixed error rate by up to 3.4% relative.

Index Terms: language modeling, factored language models, Code-Switching speech


Cite as: Adel, H., Kirchhoff, K., Telaar, D., Vu, N.T., Schlippe, T., Schultz, T. (2014) Features for factored language models for code-Switching speech. Proc. 4th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2014), 32-38

@inproceedings{adel14_sltu,
  author={Heike Adel and Katrin Kirchhoff and Dominic Telaar and Ngoc Thang Vu and Tim Schlippe and Tanja Schultz},
  title={{Features for factored language models for code-Switching speech}},
  year=2014,
  booktitle={Proc. 4th Workshop on Spoken Language Technologies for Under-Resourced Languages  (SLTU 2014)},
  pages={32--38}
}