8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Unsupervised Training with Directed Manual Transcription for Recognising Mandarin Broadcast Audio

K. Yu, M. J. F. Gales, P. C. Woodland

University of Cambridge, UK

The performance of unsupervised discriminative training has been found to be highly dependent on the accuracy of the initial automatic transcription. This paper examines a strategy where a relatively small amount of poorly recognised data are manually transcribed to supplement the automatically transcribed data. Experiments were carried out on a Mandarin broadcast transcription task using both Broadcast News (BN) and Broadcast Conversation (BC) data. A range of experimental conditions are compared for both maximum likelihood and discriminative training using directed manual transcription. For BC data, using fully unsupervised discriminative training, only 17% of the reduction in character error rate (CER) from supervised training is obtained. By automatically selecting 18% of the data for manual transcription yields 50% of the CER gain from supervised training. The directed approach to selecting data outperforms the use of a random set of data for manual transcription.

Full Paper

Bibliographic reference.  Yu, K. / Gales, M. J. F. / Woodland, P. C. (2007): "Unsupervised training with directed manual transcription for recognising Mandarin broadcast audio", In INTERSPEECH-2007, 1709-1712.