11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Lightly Supervised Recognition for Automatic Alignment of Large Coherent Speech Recordings

Norbert Braunschweiler, Mark J. F. Gales, Sabine Buchholz

Toshiba Research Europe Ltd., UK

Large quantities of audio data with associated text such as audiobooks are nowadays available. These data are attractive for a range of research areas as they include features that go beyond the level of single sentences. The proposed approach allows high quality transcriptions and associated alignments of this form of data to be automatically generated. It combines information from lightly supervised recognition and the original text to yield the final transcription. The scheme is fully automatic and has been successfully applied to a number of audiobooks. Performance measurements show low word/sentence error rates as well as high sentence boundary accuracy.

Full Paper

Bibliographic reference.  Braunschweiler, Norbert / Gales, Mark J. F. / Buchholz, Sabine (2010): "Lightly supervised recognition for automatic alignment of large coherent speech recordings", In INTERSPEECH-2010, 2222-2225.