We present a new method for automatic learning and refining of pronunciations for large vocabulary continuous speech recognition which starts from a small amount of transcribed data and uses automatic transcription techniques for additional untranscribed speech data.
The recognition performance of speech recognition systems usually depends on the available amount and quality of the transcribed training data. The creation of such data is a costly and tedious process and the approach presented here allows training with small amounts of annotated data.
The model parameters of a statistical joint-multigram grapheme-tophoneme converter are iteratively estimated using small amounts of manual and relatively larger amounts of automatic transcriptions and thus the system improves itself in an unsupervised manner.
Using the new approach, we create a Persian broadcast transcrip- tion system from less than five hours of transcribed speech and 52 hours of untranscribed audio data.
Bibliographic reference. Gollan, Christian / Ney, Hermann (2008): "Towards automatic learning in LVCSR: rapid development of a Persian broadcast transcription system", In INTERSPEECH-2008, 1441-1444.