16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using a priori Syntactic and Morphophonemic Knowledge

Preethi Jyothi, Mark Hasegawa-Johnson

University of Illinois at Urbana-Champaign, USA

In this work, we present a new large-vocabulary, broadcast news ASR system for Hindi. Since Hindi has a largely phonemic orthography, the pronunciation model was automatically generated from text. We experiment with several variants of this model and study the effect of incorporating word boundary information with these models. We also experiment with knowledge-based adaptations to the language model in Hindi, derived in an unsupervised manner, that lead to small improvements in word error rate (WER). Our experiments were conducted on a new corpus assembled from publicly-available Hindi news broadcasts. We evaluate our techniques on an open-vocabulary task and obtain competitive WERs on an unseen test set.

Full Paper

Bibliographic reference.  Jyothi, Preethi / Hasegawa-Johnson, Mark (2015): "Improved hindi broadcast ASR by adapting the language model and pronunciation model using a priori syntactic and morphophonemic knowledge", In INTERSPEECH-2015, 3164-3168.