ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora

Guergana Savova, Michael Schonwetter, Sergey Pakhomov

We propose a technique for improving language modeling for automated speech recognition of medical dictations by interpolating finished text (25M words) with small humangenerated literal or/and machine-generated semiliteral corpora. By building and testing interpolated (ILM) with literal (LILM), semiliteral (SILM) and partial (PILM) corpora, we show that both perplexity and recognition results improve significantly with LILM and SILM; the two yielding very close results.


Cite as: Savova, G., Schonwetter, M., Pakhomov, S. (2000) Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 206-209

@inproceedings{savova00_icslp,
  author={Guergana Savova and Michael Schonwetter and Sergey Pakhomov},
  title={{Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 1, 206-209}
}