In 1997, NIST introduced a voting scheme called ROVER for combining word scripts produced by different speech recognizers. This approach has achieved a relative word error reduction of up to 20% when used to combine the systemsÂ’ outputs from the 1998 and 1999 Broadcast News evaluations. Recently, there has been increasing interest in using this technique. This paper provides an analysis of several modifications of the original algorithm. Topics addressed are the order of combination, normalization/ filtering of the systemsÂ’ outputs prior to combining them, treatment of ties during voting and the incorporation of language model information. The modified ROVER achieves an additional 5% relative word error reduction on the 1998 and 1999 Broadcast News evaluation test sets. Links with recent theoretical work on alternative error measures are also discussed.
Cite as: Schwenk, H., Gauvain, J.-L. (2000) Combining multiple speech recognizers using voting and language model information. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, 915-918, doi: 10.21437/ICSLP.2000-419
@inproceedings{schwenk00_icslp, author={Holger Schwenk and Jean-Luc Gauvain}, title={{Combining multiple speech recognizers using voting and language model information}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 2, 915-918}, doi={10.21437/ICSLP.2000-419} }