ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition
April 13-16, 2003
This paper addresses automatic transformation from spoken style texts to written style texts. Exact transcriptions and speech recognition results of live lectures include many spoken language expressions, and thus, are not suitable for documents and need to be edited. In this paper, we present a method of applying of the statistical approach used in machine translation to this post-processing task. Specifically, we implement the correction of colloquial expressions, the deletion of fillers, the insertion of periods, and the insertion of particles in an integrated manner. A preliminary evaluation confirms that the statistical transformation framework works well and we achieved high recall and precision rate of period and particle insertion.
Bibliographic reference. Nanjo, Hiroaki / Shitaoka, Kazuya / Kawahara, Tatsuya (2003): "Automatic transformation of lecture transcription into document style using statistical framework", in SSPR-2003, paper TAP12.