EUROSPEECH 2003 - INTERSPEECH 2003
State-of-the-art speech recognition systems are trained using human transcriptions of speech utterances. In this paper, we describe a method to combine active and unsupervised learning for automatic speech recognition (ASR). The goal is to minimize the human supervision for training acoustic and language models and to maximize the performance given the transcribed and untranscribed data. Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples, and then selecting the most informative ones with respect to a given cost function. For unsupervised learning, we utilize the remaining untranscribed data by using their ASR output and word confidence scores. Our experiments show that the amount of labeled data needed for a given word accuracy can be reduced by 75% by combining active and unsupervised learning.
Bibliographic reference. Riccardi, Giuseppe / Hakkani-Tur, Dilek Z. (2003): "Active and unsupervised learning for automatic speech recognition", In EUROSPEECH-2003, 1825-1828.