In this paper, we present a Conditional Random Field based approach for automatic detection of edit disfluencies in a conversational telephone corpus in French. We define disfluency patterns using both linguistic and acoustic features to perform disfluency detection. Two related tasks are considered: the first task aims at detecting the disfluent speech portion proper or reparandum, i.e. the portion to be removed if we want to improve the readability of transcribed data ; in the second task, we aim at identifying also the corrected portion or repair which can be useful in follow-up discourse and dialogue analyses or in opinion mining. For these two tasks, we present comparative results as a function of the involved type of features (acoustic and/or linguistic). Generally speaking, best results are obtained by CRF models combining both acoustic and linguistic features.
Bibliographic reference. Dutrey, Camille / Clavel, Chloé / Rosset, Sophie / Vasilescu, Ioana / Adda-Decker, Martine (2014): "A CRF-based approach to automatic disfluency detection in a French call-centre corpus", In INTERSPEECH-2014, 2897-2901.