Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Automatic Alignment and Error Correction of Human Generated Transcripts for Long Speech Recordings

Timothy J. Hazen

Massachusetts Institute of Technology, USA

In this paper we examine the issues of aligning and correcting approximate human generated transcripts for long audio files. Accurate time-aligned transcriptions help provide easier access to audio materials by aiding downstream applications such as the indexing, summarizing and retrieving of audio segments. Accurate time alignments are also necessary when incorporating audio data into the training data for a speech recognizer’s acoustic model. We provide some initial analysis of manual transcriptions which show that there can be significant differences between the "approximate" manual transcripts generated by typical commercial transcription services and what was actually spoken in the recording. We then present a new alignment approach for approximate transcriptions of long audio files which is designed to discover and correct errors in the manual transcription during the alignment process.

Full Paper

Bibliographic reference.  Hazen, Timothy J. (2006): "Automatic alignment and error correction of human generated transcripts for long speech recordings", In INTERSPEECH-2006, paper 1258-Wed1CaP.2.