Speech recognition has become increasingly popular in radiology reporting
in the last decade. However, developing a speech recognition system
for a new language in a highly specific domain requires a lot of resources,
expert knowledge and skills. Therefore, commercial vendors do not offer
ready-made radiology speech recognition systems for less-resourced
languages.
This paper describes the implementation of a radiology speech
recognition system for Estonian, a language with less than one million
native speakers. The system was developed in partnership with a hospital
that provided a corpus of written reports for language modeling purposes.
Rewrite rules for pre-processing training texts and postprocessing
recognition results were created manually based on a small parallel
corpus created by the hospital’s radiologists, using the Thrax
toolkit. Deep neural network based acoustic models were trained based
on 216 hours of out-of-domain data and adapted on 14 hours of spoken
radiology data, using the Kaldi toolkit. The current word error rate
of the system is 5.4%. The system is in active use in real clinical
environment.
Cite as: Alumäe, T., Paats, A., Fridolin, I., Meister, E. (2017) Implementation of a Radiology Speech Recognition System for Estonian Using Open Source Software. Proc. Interspeech 2017, 2168-2172, doi: 10.21437/Interspeech.2017-928
@inproceedings{alumae17_interspeech, author={Tanel Alumäe and Andrus Paats and Ivo Fridolin and Einar Meister}, title={{Implementation of a Radiology Speech Recognition System for Estonian Using Open Source Software}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2168--2172}, doi={10.21437/Interspeech.2017-928} }