Do experienced speech recognition users achieve high accuracy rates because their systems have taught them successful speaking styles? We report an experiment to quantify this "speaker training" effect. In our experiment, 30 computer-literate elderly speakers (15 male, 15 female) with no previous ASR experience were given 2 hours of intensive training in using a speech recognition system. Before and after this training session, they were asked to read separate 520-word texts. Measuring the word error rates (WERs) on these "before training" and "after training" recordings, we find a small but statistically significant improvement. Before training, speakers had an average WER of 20.9%, and after training, 19.8%. We examine changes in speaking rate, phrase length, and SNR and their impact on WER. This improvement is surprisingly small; anecdotal evidence suggests that experienced ASR users have substantially higher accuracy than novices. The effect may be larger for more extensive training.
Cite as: Anderson, S., Liberman, N., Gillick, L., Foster, S., Hama, S. (1999) The effects of speaker training on ASR accuracy. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 403-406, doi: 10.21437/Eurospeech.1999-104
@inproceedings{anderson99_eurospeech, author={Stephen Anderson and Natalie Liberman and Larry Gillick and Stephen Foster and Sahoko Hama}, title={{The effects of speaker training on ASR accuracy}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={403--406}, doi={10.21437/Eurospeech.1999-104} }