8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Automatic Transformation of Lecture Transcription into Document Style using Statistical Framework

Tatsuya Kawahara (1), Kazuya Shitaoka (1), Hiroaki Nanjo (2)

(1) Kyoto University, Japan
(2) Ryukoku University, Japan

This paper addresses automatic transformation from spoken style texts to written style texts. Exact transcriptions and speech recoginition results of live lectures include many spoken language expressions, and thus, are not suitable for documents and need to be edited. In this paper, we present a method of applying of the statistical approach used in machine translation to this postprocessing task. Specifically, we implement the correction of colloquial expressions, the delection of fillers, the insertion of periods, and the insertion of particles in an integrated manner. A preliminaly evaluation confirms that the statistical transformation framework works well and we achieved high recall and precision rate of period and particle insertion.

Full Paper

Bibliographic reference.  Kawahara, Tatsuya / Shitaoka, Kazuya / Nanjo, Hiroaki (2004): "Automatic transformation of lecture transcription into document style using statistical framework", In INTERSPEECH-2004, 2881-2884.