In this paper, we present the approach we used to produce a training database from a set of recorded newscasts for which we had inaccurate transcriptions. These transcribed segments correspond to a set of prepared anchor texts and journalist stories, not necessarily in chronological order of their actual presentation. No segmental time boundary information is provided. Our main concern is thus to establish time marks that delimit the audio segments of the corresponding texts. To resolve this problem, we have developed a time marking procedure using our speech recognition engine. We obtain a segmentation accuracy of 80%.
Cite as: Cardinal, P., Boulianne, G., Comeau, M. (2005) Segmentation of recordings based on partial transcriptions. Proc. Interspeech 2005, 3345-3348, doi: 10.21437/Interspeech.2005-859
@inproceedings{cardinal05_interspeech, author={Patrick Cardinal and Gilles Boulianne and Michel Comeau}, title={{Segmentation of recordings based on partial transcriptions}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={3345--3348}, doi={10.21437/Interspeech.2005-859} }