Large quantities of audio data with associated text such as audiobooks are nowadays available. These data are attractive for a range of research areas as they include features that go beyond the level of single sentences. The proposed approach allows high quality transcriptions and associated alignments of this form of data to be automatically generated. It combines information from lightly supervised recognition and the original text to yield the final transcription. The scheme is fully automatic and has been successfully applied to a number of audiobooks. Performance measurements show low word/sentence error rates as well as high sentence boundary accuracy.
Bibliographic reference. Braunschweiler, Norbert / Gales, Mark J. F. / Buchholz, Sabine (2010): "Lightly supervised recognition for automatic alignment of large coherent speech recordings", In INTERSPEECH-2010, 2222-2225.