15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Discriminative Pronunciation Modeling for Dialectal Speech Recognition

Maider Lehr (1), Kyle Gorman (1), Izhak Shafran (2)

(1) Oregon Health & Science University, USA
(2) Google, USA

Speech recognizers are typically trained with data from a standard dialect and do not generalize to non-standard dialects. Mismatch mainly occurs in the acoustic realization of words, which is represented by acoustic models and pronunciation lexicon. Standard techniques for addressing this mismatch are generative in nature and include acoustic model adaptation and expansion of lexicon with pronunciation variants, both of which have limited effectiveness. We present a discriminative pronunciation model whose parameters are learned jointly with parameters from the language models. We tease apart the gains from modeling the transitions of canonical phones, the transduction from surface to canonical phones, and the language model. We report experiments on African American Vernacular English (AAVE) using NPR's StoryCorps corpus. Our models improve the performance over the baseline by about 2.1% on AAVE, of which 0.6% can be attributed to the pronunciation model. The model learns the most relevant phonetic transformations for AAVE speech.

Full Paper

Bibliographic reference.  Lehr, Maider / Gorman, Kyle / Shafran, Izhak (2014): "Discriminative pronunciation modeling for dialectal speech recognition", In INTERSPEECH-2014, 1458-1462.