This paper proposes a new method for automatically detecting disfluencies in spontaneous speech . specifically, self-corrections . that explicitly models repetitions vs. other disfluencies. We show that, in a corpus of Supreme Court oral arguments, repetition disfluencies can be longer and more stutter-like than the short repetitions observed in the Switchboard corpus and suggest that they can be better represented with a flat structure that covers the full sequence. Since these disfluencies are relatively easy to detect, weakly supervised training is an effective way to minimize labeling costs. By explicitly modeling these, we improve general disfluency detection within and across domains, and we provide a richer transcript.
Bibliographic reference. Ostendorf, Mari / Hahn, Sangyun (2013): "A sequential repetition model for improved disfluency detection", In INTERSPEECH-2013, 2624-2628.