COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction
University of East Anglia, Norwich, UK
Automatic transcription of spontaneously dictated medical records has a large potential for improving the quality and reducing the cost of patient care in Norwegian hospitals. In this paper, we describe the design of an evaluation database for this task, and study the occurrence of typical disfluencies. Furthermore, we study the improvements of word accuracy obtained by the use of speaker adaptation and different methods of modeling speaker generated noise.
Explicit modeling of speaker generated noise gave 6.5% improvement across all speakers, while the improvements for the individual speakers ranged up to 14%. Speaker adaptation gave additional 20% improvement accross all speakers, while the improvements for the individual speakers ranged up to 34%.
The best overall word accuracy is still of only approximately 50%, which is far below the requirement for a practical system. It is however expected that a considerable improvement could be achieved by training an appropriate language model.
Bibliographic reference. Gajic, Bojana / Markhus, Vidar / Pettersen, Svein Gunnar / Johnsen, Magne Hallstein (2004): "Automatic recognition of spontaneously dictated medical records for norwegian", In Robust2004, paper 43.