Many speech user interfaces can be improved by incrementally displaying or interpreting a speech recognizer's current best path as a user speaks. This gives rise to a problem of instability, whereby the best path may change frequently, particularly with respect to the words most recently spoken. Introducing a lag between the audio most recently processed and the portion of the best path shown to the user can lead to a more usable incremental results. In the ideal case, the lag introduced would vary to recover exactly the longest stable prefix of the best path. In this paper, we introduce a framework for estimating a stability statistic for each word, and explore the tradeoff of stability and lag by thresholding stability statistics estimated using a variety of features.
Bibliographic reference. McGraw, Ian / Gruenstein, Alexander (2012): "Estimating word-stability during incremental speech recognition", In INTERSPEECH-2012, 1019-1022.