Inserting proper punctuation marks and deleting speech disfluencies
are two of the most essential tasks in spoken language processing.
This challenging task has prompted extensive research using various
techniques, such as conditional random fields. Neural networks, however,
are relatively under-explored for this task.
Combining different modeling techniques with different advantages has the potential to lead to improvements. In this work, we first establish the performance of joint modeling of punctuation prediction and disfluency detection using neural networks. We then combine a conditional random fields based model and a neural networks based model log-linearly, and show that the combined approach outperforms both individual models, by 2.7% and 3.5% in F-score for speech disfluency and punctuation detection, respectively. When used as a preprocessing step to machine translation this also results in an improved translation quality of 2.5 BLEU points compared to the baseline and of 0.6 BLEU points compared to the non-combined model.
Bibliographic reference. Cho, Eunah / Kilgour, Kevin / Niehues, Jan / Waibel, Alex (2015): "Combination of NN and CRF models for joint detection of punctuation and disfluencies", In INTERSPEECH-2015, 3650-3654.