11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

An Empirical Comparison of the T3, Juicer, HDecode and Sphinx3 Decoders

Josef R. Novak (1), Paul R. Dixon (2), Sadaoki Furui (1)

(1) Tokyo Institute of Technology, Japan
(2) NICT, Japan

In this paper we perform a cross-comparison of the T3 WFST decoder against three different speech recognition decoders on three separate tasks of variable difficulty. We show that the T3 decoder performs favorably against several established veterans in the field, including the Juicer WFST decoder, Sphinx3, and HDecode in terms of RTF versus Word Accuracy. In addition to comparing decoder performance, we evaluate both Sphinx and HTK acoustic models on a common footing inside T3, and show that the speed benefits that typically accompany the WFST approach increase with the size of the vocabulary and other input knowledge sources. In the case of T3, we also show that GPU acceleration can significantly extend these gains.

Full Paper

Bibliographic reference.  Novak, Josef R. / Dixon, Paul R. / Furui, Sadaoki (2010): "An empirical comparison of the t3, juicer, HDecode and sphinx3 decoders", In INTERSPEECH-2010, 1890-1893.