12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Commas Recovery with Syntactic Features in French and in Czech

Christophe Cerisara (1), Pavel Král (2), Claire Gardent (1)

(1) LORIA, France
(2) University of West Bohemia, Czech Republic

Automatic speech transcripts can be made more readable and useful for further processing by enriching them with punctuation marks and other meta-linguistic information. We study in this work how to improve automatic recovery of one of the most difficult punctuation marks, commas, in French and in Czech. We show that commas detection performances are largely improved in both languages by integrating into our baseline Conditional Random Field model syntactic features derived from dependency structures. We further study the relative impact of language-independent vs. specific features, and show that a combination of both of them gives the largest improvement. Robustness of these features to speech recognition errors is finally discussed.

Full Paper

Bibliographic reference.  Cerisara, Christophe / Král, Pavel / Gardent, Claire (2011): "Commas recovery with syntactic features in French and in Czech", In INTERSPEECH-2011, 1413-1416.