This paper investigates unsupervised language model adaptation on a new task of Mandarin broadcast conversation transcription. It was found that N-gram adaptation yields 1.1% absolute character error rate gain and continuous space language model adaptation done with PLSA and LDA brings 1.3% absolute gain. Moreover, using broadcast news language model alone trained on large data under-performs a model that includes additional small amount of broadcast conversations by 1.8% absolute character error rate. Although, broadcast news and broadcast conversation tasks are related, this result shows their large mismatch. In addition, it was found that it is possible to do a reliable detection of broadcast news and broadcast conversation data with the N-gram adaptation.
Cite as: Mrva, D., Woodland, P.C. (2006) Unsupervised language model adaptation for Mandarin broadcast conversation transcription. Proc. Interspeech 2006, paper 1549-Thu1A2O.3, doi: 10.21437/Interspeech.2006-574
@inproceedings{mrva06_interspeech, author={David Mrva and Philip C. Woodland}, title={{Unsupervised language model adaptation for Mandarin broadcast conversation transcription}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1549-Thu1A2O.3}, doi={10.21437/Interspeech.2006-574} }