13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Estimating Word-Stability During Incremental Speech Recognition

Ian McGraw (1), Alexander Gruenstein (2)

(1) Massachusetts Institute of Technology, Cambridge, MA, USA; (2) Google, USA

Many speech user interfaces can be improved by incrementally displaying or interpreting a speech recognizer's current best path as a user speaks. This gives rise to a problem of instability, whereby the best path may change frequently, particularly with respect to the words most recently spoken. Introducing a lag between the audio most recently processed and the portion of the best path shown to the user can lead to a more usable incremental results. In the ideal case, the lag introduced would vary to recover exactly the longest stable prefix of the best path. In this paper, we introduce a framework for estimating a stability statistic for each word, and explore the tradeoff of stability and lag by thresholding stability statistics estimated using a variety of features.

Bibliographic reference.  McGraw, Ian / Gruenstein, Alexander (2012): "Estimating word-stability during incremental speech recognition", In INTERSPEECH-2012, 1019-1022.