14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

A Sequential Repetition Model for Improved Disfluency Detection

Mari Ostendorf, Sangyun Hahn

University of Washington, USA

This paper proposes a new method for automatically detecting disfluencies in spontaneous speech . specifically, self-corrections . that explicitly models repetitions vs. other disfluencies. We show that, in a corpus of Supreme Court oral arguments, repetition disfluencies can be longer and more stutter-like than the short repetitions observed in the Switchboard corpus and suggest that they can be better represented with a flat structure that covers the full sequence. Since these disfluencies are relatively easy to detect, weakly supervised training is an effective way to minimize labeling costs. By explicitly modeling these, we improve general disfluency detection within and across domains, and we provide a richer transcript.

Full Paper

Bibliographic reference.  Ostendorf, Mari / Hahn, Sangyun (2013): "A sequential repetition model for improved disfluency detection", In INTERSPEECH-2013, 2624-2628.