14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Discriminative Pronunciation Modeling Based on Minimum Phone Error Training

Meixu Song, Qingqing Zhang, Jielin Pan, Yonghong Yan

Chinese Academy of Sciences, China

Introducing pronunciation models into decoding has proven beneficial for LVCSR. As Minimum Phone Error (MPE) training has almost become a standard scheme for acoustic modeling, a discriminative pronunciation modeling method is investigated under the framework of MPE training. In order to bring the pronunciation models into MPE training, the auxiliary function of MPE training is rewritten at word level, and decomposes into two parts. One is for co-training the acoustic models, and the other is for discriminatively training the pronunciation models. On Mandarin conversational telephone speech recognition task, compared to the baseline using a canonical lexicon, the discriminative pronunciation models reduced the absolute Character Error Rate (CER) by 0.7% on LDC test set, and with the acoustic model co-training, about 1% additional CER decrease had been achieved.

Full Paper

Bibliographic reference.  Song, Meixu / Zhang, Qingqing / Pan, Jielin / Yan, Yonghong (2013): "Discriminative pronunciation modeling based on minimum phone error training", In INTERSPEECH-2013, 1941-1945.