5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

A Prosody Only Decision-Tree Model for Disfluency Detection

Elizabeth Shriberg (1), Rebecca Bates (2), Andreas Stolcke (1)

(1) Speech Technology and Research Laboratory, SRI International, Menlo Park, California, USA
(2) Dept. of Electrical Engineering, Boston University, Boston, Massachusetts, USA

Speech disfluencies (filled pauses, repetitions, repairs, and false starts) are pervasive in spontaneous speech. The ability to detect and correct disfluencies automatically is important for effective natural language understanding, as well as to improve speech models in general. Previous approaches to disfluency detection have relied heavily on lexical information, which makes them less applicable when word recognition is unreliable. We have developed a disfluency detection method using decision tree classifiers that use only local and automatically extracted prosodic features. Because the model doesn't rely on lexical information, it is widely applicable even when word recognition is unreliable. The model performed significantly better than chance at detecting four disfluency types. It also outperformed a language model in the detection of false starts, given the correct transcription. Combining the prosody model with a specialized language model improved accuracy over either model alone for the detection of false starts. Results suggest that a prosody-only model can aid the automatic detection of disfluencies in spontaneous speech.

Full Paper

Bibliographic reference.  Shriberg, Elizabeth / Bates, Rebecca / Stolcke, Andreas (1997): "A prosody only decision-tree model for disfluency detection", In EUROSPEECH-1997, 2383-2386.