Third Workshop on Spoken Language Technologies for Under-resourced Languages
Cape Town, South Africa
We report on our efforts toward an LVCSR system for the African language Hausa. We describe the Hausa text and speech database recently collected as a part of our Global- Phone corpus . The data was complemented by a large collection of text data crawled from various Hausa websites. We achieve significant improvement by automatically substituting inconsistent or flawed pronunciation dictionary entries, including tone and vowel length information, applying stateof- the art techniques for acoustic modeling, and crawling large quantities of text material from the Internet for language modeling. A system combination of the best grapheme- and phoneme-based 2-pass systems achieves a word error rate of 13.16% on the development set and 16.26% on the test set on read newspaper speech.
Index Terms: speech recognition, rapid language adaptation, Hausa, African language
Bibliographic reference. Schlippe, Tim / Djomgang, Edy Guevara Komgang / Vu, Ngoc Thang / Ochs, Sebastian / Schultz, Tanja (2012): "Hausa large vocabulary continuous speech recognition", In SLTU-2012, 11-14.