Fourth International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU-2014)
St. Petersburg, Russia
This paper presents investigations of features which can be used to predict Code-Switching speech. For this task, factored language models are applied and implemented into a state-of-the-art decoder. Different possible factors, such as words, part-of-speech tags, Brown word clusters, open class words and open class word clusters are explored. We find that Brown word clusters, part-of-speech tags and open-class words are most effective at reducing the perplexity of factored language models on the Mandarin-English Code-Switching corpus SEAME. In decoding experiments, the model containing Brown word clusters and part-of-speech tags and the model also including open class word clusters yield the best mixed error rate results. In summary, the factored language models can reduce the perplexity on the SEAME evaluation set by up to 10.8% relative and the mixed error rate by up to 3.4% relative.
Index Terms: language modeling, factored language models, Code-Switching speech
Bibliographic reference. Adel, Heike / Kirchhoff, Katrin / Telaar, Dominic / Vu, Ngoc Thang / Schlippe, Tim / Schultz, Tanja (2014): "Features for factored language models for code-Switching speech", In SLTU-2014, 32-38.