ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition
April 13-16, 2003
We present our recent progress in filled pause (FP) modeling for a highly spontaneous medical transcription task. Our studies con- firm that FP modeling is an important topic for spontaneous speech applications, which must be explicitly addressed in acoustic, lexical, and language modeling. We provide a framework for datadriven lexical modeling of FP acoustic variability with respect to phonemic realization and duration. By using a number of properly weighted FP pronunciation variants of variable lengths and applying specific acoustic models for FP, we achieved an 8% relative reduction of the word error rate. We also tested different approaches for handling FP in the language model and integrating FP into the decoder. Best results with respect to both perplexity and word error rate have been achieved by predicting FP probabilistically and removing it from the language model history. This approach reduces the perplexity by 4% and provides a further gain in word accuracy.
Bibliographic reference. Schramm, Hauke / Aubert, Xavier L. / Meyer, Carsten / Peters, Jochen (2003): "Filled-pause modeling for medical transcriptions", in SSPR-2003, paper TMO6.