9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Towards Automatic Learning in LVCSR: Rapid Development of a Persian Broadcast Transcription System

Christian Gollan, Hermann Ney

RWTH Aachen University, Germany

We present a new method for automatic learning and refining of pronunciations for large vocabulary continuous speech recognition which starts from a small amount of transcribed data and uses automatic transcription techniques for additional untranscribed speech data.

The recognition performance of speech recognition systems usually depends on the available amount and quality of the transcribed training data. The creation of such data is a costly and tedious process and the approach presented here allows training with small amounts of annotated data.

The model parameters of a statistical joint-multigram grapheme-tophoneme converter are iteratively estimated using small amounts of manual and relatively larger amounts of automatic transcriptions and thus the system improves itself in an unsupervised manner.

Using the new approach, we create a Persian broadcast transcrip- tion system from less than five hours of transcribed speech and 52 hours of untranscribed audio data.

Full Paper

Bibliographic reference.  Gollan, Christian / Ney, Hermann (2008): "Towards automatic learning in LVCSR: rapid development of a Persian broadcast transcription system", In INTERSPEECH-2008, 1441-1444.