To train a code switching language model for mixed language speech recognition, we propose to assign weights to the sentence pairs in the parallel text data. The code switching language model which is composed of the code switching boundary prediction model, code switching translation model and reconstruction model is incorporated with a language for mixed language speech recognition. The code switching translation model which is trained using selected subsets of the sentence pairs in the parallel text data allows the decoder to make the decision whether a phrase is in the matrix language or in the embedded language. Moreover, we propose a weighting procedure while training the code switching translation model. We evaluate our methods on Mandarin-English code switching lecture speech and lunch conversations. Our proposed method reduces word error rate by a statistically significant 1.74% on the lecture speech, and by 1.29% on the lunch conversation over the conventional interpolated language model.
Bibliographic reference. Li, Ying / Fung, Pascale (2013): "Language modeling for mixed language speech recognition using weighted phrase extraction", In INTERSPEECH-2013, 2599-2603.