For dialogue systems to become robust, they must be able to detect disfluencies accurately and with minimal latency. To meet this challenge, here we frame incremental disfluency detection as a word-by-word tagging task and, following their recent success in Spoken Language Understanding tasks, we test the performance of Recurrent Neural Networks (RNNs). We experiment with different inputs for RNNs to explore the effect of context on their ability to detect edit terms and repair disfluencies effectively. Although not eclipsing the state of the art in terms of utterance-final performance, RNNs achieve good detection results, requiring no feature engineering and using simple input vectors representing the incoming utterance as their training input. Furthermore, RNNs show very good incremental properties with low latency and very good output stability, surpassing previously reported results in these measures.
Bibliographic reference. Hough, Julian / Schlangen, David (2015): "Recurrent neural networks for incremental disfluency detection", In INTERSPEECH-2015, 849-853.