Sixth International Conference on Spoken Language Processing
October 16-20, 2000
Combining Multiple Speech Recognizers Using Voting and Language Model Information
Holger Schwenk, Jean-Luc Gauvain
LIMSI-CNRS, Orsay, France
In 1997, NIST introduced a voting scheme called ROVER for
combining word scripts produced by different speech recognizers.
This approach has achieved a relative word error reduction
of up to 20% when used to combine the systemsí outputs from
the 1998 and 1999 Broadcast News evaluations. Recently, there
has been increasing interest in using this technique. This paper
provides an analysis of several modifications of the original algorithm.
Topics addressed are the order of combination, normalization/
filtering of the systemsí outputs prior to combining them,
treatment of ties during voting and the incorporation of language
model information. The modified ROVER achieves an additional
5% relative word error reduction on the 1998 and 1999 Broadcast
News evaluation test sets. Links with recent theoretical work on
alternative error measures are also discussed.
Schwenk, Holger / Gauvain, Jean-Luc (2000):
"Combining multiple speech recognizers using voting and language model information",
In ICSLP-2000, vol.2, 915-918.