Sixth International Conference on Spoken Language Processing (ICSLP 2000)
October 16-20, 2000
Improving Language Model Perplexity and Recognition Accuracy for Medical Dictations Via Within-Domain Interpolation with Literal and Semi-Literal Corpora
Guergana Savova, Michael Schonwetter, Sergey Pakhomov
Lernout and Hauspie, MN, USA
We propose a technique for improving language modeling for
automated speech recognition of medical dictations by
interpolating finished text (25M words) with small humangenerated
literal or/and machine-generated semiliteral corpora.
By building and testing interpolated (ILM) with literal (LILM),
semiliteral (SILM) and partial (PILM) corpora, we show that
both perplexity and recognition results improve significantly
with LILM and SILM; the two yielding very close results.
Savova, Guergana / Schonwetter, Michael / Pakhomov, Sergey (2000):
"Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora",
In ICSLP-2000, vol.1, 206-209.