In this paper we examine the issues of aligning and correcting approximate human generated transcripts for long audio files. Accurate time-aligned transcriptions help provide easier access to audio materials by aiding downstream applications such as the indexing, summarizing and retrieving of audio segments. Accurate time alignments are also necessary when incorporating audio data into the training data for a speech recognizers acoustic model. We provide some initial analysis of manual transcriptions which show that there can be significant differences between the "approximate" manual transcripts generated by typical commercial transcription services and what was actually spoken in the recording. We then present a new alignment approach for approximate transcriptions of long audio files which is designed to discover and correct errors in the manual transcription during the alignment process.
Cite as: Hazen, T.J. (2006) Automatic alignment and error correction of human generated transcripts for long speech recordings. Proc. Interspeech 2006, paper 1258-Wed1CaP.2, doi: 10.21437/Interspeech.2006-449
@inproceedings{hazen06_interspeech, author={Timothy J. Hazen}, title={{Automatic alignment and error correction of human generated transcripts for long speech recordings}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1258-Wed1CaP.2}, doi={10.21437/Interspeech.2006-449} }