15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

A CRF-Based Approach to Automatic Disfluency Detection in a French Call-Centre Corpus

Camille Dutrey (1), Chloé Clavel (2), Sophie Rosset (3), Ioana Vasilescu (3), Martine Adda-Decker (3)

(1) EDF, France
(2) LTCI, France
(3) LIMSI, France

In this paper, we present a Conditional Random Field based approach for automatic detection of edit disfluencies in a conversational telephone corpus in French. We define disfluency patterns using both linguistic and acoustic features to perform disfluency detection. Two related tasks are considered: the first task aims at detecting the disfluent speech portion proper or reparandum, i.e. the portion to be removed if we want to improve the readability of transcribed data ; in the second task, we aim at identifying also the corrected portion or repair which can be useful in follow-up discourse and dialogue analyses or in opinion mining. For these two tasks, we present comparative results as a function of the involved type of features (acoustic and/or linguistic). Generally speaking, best results are obtained by CRF models combining both acoustic and linguistic features.

Full Paper

Bibliographic reference.  Dutrey, Camille / Clavel, Chloé / Rosset, Sophie / Vasilescu, Ioana / Adda-Decker, Martine (2014): "A CRF-based approach to automatic disfluency detection in a French call-centre corpus", In INTERSPEECH-2014, 2897-2901.