ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Unsupervised language model adaptation for Mandarin broadcast conversation transcription

David Mrva, Philip C. Woodland

This paper investigates unsupervised language model adaptation on a new task of Mandarin broadcast conversation transcription. It was found that N-gram adaptation yields 1.1% absolute character error rate gain and continuous space language model adaptation done with PLSA and LDA brings 1.3% absolute gain. Moreover, using broadcast news language model alone trained on large data under-performs a model that includes additional small amount of broadcast conversations by 1.8% absolute character error rate. Although, broadcast news and broadcast conversation tasks are related, this result shows their large mismatch. In addition, it was found that it is possible to do a reliable detection of broadcast news and broadcast conversation data with the N-gram adaptation.


doi: 10.21437/Interspeech.2006-574

Cite as: Mrva, D., Woodland, P.C. (2006) Unsupervised language model adaptation for Mandarin broadcast conversation transcription. Proc. Interspeech 2006, paper 1549-Thu1A2O.3, doi: 10.21437/Interspeech.2006-574

@inproceedings{mrva06_interspeech,
  author={David Mrva and Philip C. Woodland},
  title={{Unsupervised language model adaptation for Mandarin broadcast conversation transcription}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1549-Thu1A2O.3},
  doi={10.21437/Interspeech.2006-574}
}