7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Maximum Entropy Model for Punctuation Annotation from Speech

Jing Huang, Geoffrey Zweig

IBM T.J. Watson Research Center, USA

In this paper we develop a maximum-entropy based method for annotating spontaneous conversational speech with punctuation. The goal of this task is to make automatic transcriptions more readable by humans, and to render them into a form that is useful for subsequent natural language processing and discourse analysis. Our basic approach is to view the insertion of punctuation as a form of tagging, in which words are tagged with appropriate punctuation, and to apply a maximum entropy tagger that uses both lexical and prosodic features. We present experimental results on Switchboard data with both reference transcriptions and transcriptions produced by a speech recognition system.

Full Paper

Bibliographic reference.  Huang, Jing / Zweig, Geoffrey (2002): "Maximum entropy model for punctuation annotation from speech", In ICSLP-2002, 917-920.