15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Multi-Pass Sentence-End Detection of Lecture Speech

Madina Hasan, Rama Doddipatla, Thomas Hain

University of Sheffield, UK

Making speech recognition output readable is an important task. The first step here is automatic sentence end detection (SED). We introduce novel F0 derivative-based features and sentence end distance features for SED that yield significant improvements in slot error rate (SER) in a multi-pass framework. Three different SED approaches are compared on a spoken lecture task: hidden event language models, boosting, and conditional random fields (CRFs). Experiments on reference transcripts show that CRF-based models give best results. Inclusion of pause duration features yields an improvement of 11.1% in SER. The addition of the F0-derivative features gives a further reduction of 3.0% absolute, and an additional 0.5% is gained by use of backward distance features. In the absence of audio, the use of backward features alone yields 2.2% absolute reduction in SER.

Full Paper

Bibliographic reference.  Hasan, Madina / Doddipatla, Rama / Hain, Thomas (2014): "Multi-pass sentence-end detection of lecture speech", In INTERSPEECH-2014, 2902-2906.