16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Combination of NN and CRF Models for Joint Detection of Punctuation and Disfluencies

Eunah Cho, Kevin Kilgour, Jan Niehues, Alex Waibel

KIT, Germany

Inserting proper punctuation marks and deleting speech disfluencies are two of the most essential tasks in spoken language processing. This challenging task has prompted extensive research using various techniques, such as conditional random fields. Neural networks, however, are relatively under-explored for this task.
    Combining different modeling techniques with different advantages has the potential to lead to improvements. In this work, we first establish the performance of joint modeling of punctuation prediction and disfluency detection using neural networks. We then combine a conditional random fields based model and a neural networks based model log-linearly, and show that the combined approach outperforms both individual models, by 2.7% and 3.5% in F-score for speech disfluency and punctuation detection, respectively. When used as a preprocessing step to machine translation this also results in an improved translation quality of 2.5 BLEU points compared to the baseline and of 0.6 BLEU points compared to the non-combined model.

Full Paper

Bibliographic reference.  Cho, Eunah / Kilgour, Kevin / Niehues, Jan / Waibel, Alex (2015): "Combination of NN and CRF models for joint detection of punctuation and disfluencies", In INTERSPEECH-2015, 3650-3654.