Introducing pronunciation models into decoding has proven beneficial for LVCSR. As Minimum Phone Error (MPE) training has almost become a standard scheme for acoustic modeling, a discriminative pronunciation modeling method is investigated under the framework of MPE training. In order to bring the pronunciation models into MPE training, the auxiliary function of MPE training is rewritten at word level, and decomposes into two parts. One is for co-training the acoustic models, and the other is for discriminatively training the pronunciation models. On Mandarin conversational telephone speech recognition task, compared to the baseline using a canonical lexicon, the discriminative pronunciation models reduced the absolute Character Error Rate (CER) by 0.7% on LDC test set, and with the acoustic model co-training, about 1% additional CER decrease had been achieved.
Bibliographic reference. Song, Meixu / Zhang, Qingqing / Pan, Jielin / Yan, Yonghong (2013): "Discriminative pronunciation modeling based on minimum phone error training", In INTERSPEECH-2013, 1941-1945.