We describe a novel method for tuning the decoding parameters of a speech-to-text system so as to minimize word error rate (WER) subject to an over-all time constraint. When applied to three sub-realtime systems for recognizing English conversational telephone speech, the method gave speed improvements of up to 21.1% while at the same time reducing WER by up to 6.7%.
Cite as: Colthurst, T., Arvizo, T., Kao, C.-L., Kimball, O., Lowe, S.A., Miller, D.R.H., Sciver, J.V. (2007) Parameter tuning for fast speech recognition. Proc. Interspeech 2007, 1477-1480, doi: 10.21437/Interspeech.2007-428
@inproceedings{colthurst07_interspeech, author={Thomas Colthurst and Tresi Arvizo and Chia-Lin Kao and Owen Kimball and Stephen A. Lowe and David R. H. Miller and Jim Van Sciver}, title={{Parameter tuning for fast speech recognition}}, year=2007, booktitle={Proc. Interspeech 2007}, pages={1477--1480}, doi={10.21437/Interspeech.2007-428} }