We propose to compare between a Deep Neural Network and a Conditional Random Field disfluency detection and reconstruction system, both trained on the same features. Deep Neural Networks, despite an increasing popularity in a multitude of speech and language related tasks, were never applied to disfluency recognition. One of the most difficult classes of disfluency is false starts. We are interested in comparing these two approaches on recognition of different types of disfluency. Our experimental results over the SSR v2 corpus show that the DNN approach outperforms CRF slightly on repetition disfluencies. However, DNN exhibits a very low recall (17.6% compared to 26.3% of the CRF) over the more difficult false start recognition subtask. When applied to a corpus of sentences with only false starts, the two methods give both higher results (46.7% F-score for CRF and 44.2% F-score for DNN). We also propose to improve the overall results on false start by training our classifiers in two stages: the first to recognize non-false start errors, the second for false start only, and combining the two approaches using a simple voting algorithm. This allows us to obtain the best result of 52.2% F-score over false start.
Bibliographic reference. Bertero, Dario / Wang, Linlin / Chan, Ho Yin / Fung, Pascale (2015): "A comparison between a DNN and a CRF disfluency detection and reconstruction system", In INTERSPEECH-2015, 844-848.