Although tremendous progress has been made in speech recognition technology, with the capability of todays state-of-the-art systems to transcribe unrestricted continuous speech from broadcast data, these systems rely on the availability of large amounts of manually transcribed acoustic training data. Obtaining such data is both time-consuming and expensive, requiring trained human annotators with substantial amounts of supervision. In this paper we describe some recent experiments using lightly supervised techniques for acoustic model training in order to reduce the system development cost. The strategy we investigate uses a speech recognizer to transcribe unannotated broadcast news data, and optionally combines the hypothesized transcription with associated, but unaligned closed captions or transcripts to create labeled training. We show that this approach can dramatically reduces the cost of building acoustic models.
Cite as: Lamel, L., Gauvain, J.-L., Adda, G. (2000) Lightly supervised acoustic model training. Proc. ASR2000 - Automatic Speech Recognition: Challenges for the New Millenium, 150-154
@inproceedings{lamel00_asr, author={Lori Lamel and Jean-Luc Gauvain and Gilles Adda}, title={{Lightly supervised acoustic model training}}, year=2000, booktitle={Proc. ASR2000 - Automatic Speech Recognition: Challenges for the New Millenium}, pages={150--154} }