This paper investigates automatic detection of different types of self-repairs in spontaneous speech under different social contexts, from casual conversations to government hearings. The work shows that a simple CRF-based model is effective for cross-domain training, which is important for contexts where annotated data is not available. The approach explicitly represents common types of disfluencies observed in multi-domain data both in the model state space and the features extracted. In addition, the model incorporates an expanded state space for recognizing the repair structure, unlike prior work that annotates only the reparandum.
Bibliographic reference. Zayats, Victoria / Ostendorf, Mari / Hajishirzi, Hannaneh (2014): "Multi-domain disfluency and repair detection", In INTERSPEECH-2014, 2907-2911.