Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Improving Language Model Perplexity and Recognition Accuracy for Medical Dictations Via Within-Domain Interpolation with Literal and Semi-Literal Corpora

Guergana Savova, Michael Schonwetter, Sergey Pakhomov

Lernout and Hauspie, MN, USA

We propose a technique for improving language modeling for automated speech recognition of medical dictations by interpolating finished text (25M words) with small humangenerated literal or/and machine-generated semiliteral corpora. By building and testing interpolated (ILM) with literal (LILM), semiliteral (SILM) and partial (PILM) corpora, we show that both perplexity and recognition results improve significantly with LILM and SILM; the two yielding very close results.


Full Paper

Bibliographic reference.  Savova, Guergana / Schonwetter, Michael / Pakhomov, Sergey (2000): "Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora", In ICSLP-2000, vol.1, 206-209.