Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Combining Multiple Speech Recognizers Using Voting and Language Model Information

Holger Schwenk, Jean-Luc Gauvain

LIMSI-CNRS, Orsay, France

In 1997, NIST introduced a voting scheme called ROVER for combining word scripts produced by different speech recognizers. This approach has achieved a relative word error reduction of up to 20% when used to combine the systemsí outputs from the 1998 and 1999 Broadcast News evaluations. Recently, there has been increasing interest in using this technique. This paper provides an analysis of several modifications of the original algorithm. Topics addressed are the order of combination, normalization/ filtering of the systemsí outputs prior to combining them, treatment of ties during voting and the incorporation of language model information. The modified ROVER achieves an additional 5% relative word error reduction on the 1998 and 1999 Broadcast News evaluation test sets. Links with recent theoretical work on alternative error measures are also discussed.

Full Paper

Bibliographic reference.  Schwenk, Holger / Gauvain, Jean-Luc (2000): "Combining multiple speech recognizers using voting and language model information", In ICSLP-2000, vol.2, 915-918.