8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Recovering Punctuation Marks for Automatic Speech Recognition

Fernando Batista (1), Diamantino Caseiro (2), Nuno Mamede (2), Isabel Trancoso (2)

(1) L2F INESC-ID/ISCTE, Portugal
(2) L2F INESC-ID/IST, Portugal

This paper shows results of recovering punctuation over speech transcriptions for a Portuguese broadcast news corpus. The approach is based on maximum entropy models and uses word, part-of-speech, time and speaker information. The contribution of each type of feature is analyzed individually. Separate results for each focus condition are given, making it possible to analyze the differences of performance between planned and spontaneous speech.

Full Paper

Bibliographic reference.  Batista, Fernando / Caseiro, Diamantino / Mamede, Nuno / Trancoso, Isabel (2007): "Recovering punctuation marks for automatic speech recognition", In INTERSPEECH-2007, 2153-2156.