15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Variable Span Disfluency Detection in ASR Transcripts

Rahul Gupta (1), Sankaranarayanan Ananthakrishnan (2), Zhaojun Yang (1), Shrikanth S. Narayanan (1)

(1) University of Southern California, USA
(2) Raytheon BBN Technologies, USA

Natural conversations often involve disfluencies in the form of revisions, repetitions, interjections, filled pauses and such. This paper focuses on word/phrase repetitions and revisions that are lexically well formed. These are generally captured by an ASR but pose problems to downstream processing such as spoken language translation (SLT). We describe a system to identify such word level disfluencies with a goal towards removing them in real time from the automatic recognition (ASR) system output. We use a span based training system to utilize the contextual information while tagging disfluencies. We design our system on the oracle transcripts and test them on both reference and ASR transcripts. We achieve an area under the receiver operating characteristics (ROC) curve for word level disfluency detection of .93 and .87 for the reference and the ASR transcripts respectively.

Full Paper

Bibliographic reference.  Gupta, Rahul / Ananthakrishnan, Sankaranarayanan / Yang, Zhaojun / Narayanan, Shrikanth S. (2014): "Variable Span disfluency detection in ASR transcripts", In INTERSPEECH-2014, 2892-2896.