In discriminative training methods, the objective function is designed to improve the performance of automatic speech recognition with reference to correct labels using a single system. On the other hand, system combination methods, which output refined hypotheses by a majority voting scheme, need to build multiple systems that generate complementary hypotheses. This paper aims to unify the both requirements within a discriminative training framework based on the mutual information criterion. That is, we construct complementary models by optimizing the proposed objective function, which yields to minimize the mutual information with base systems' hypotheses, while maximize that with correct labels, at the same time. We also analyze that this scheme corresponds to weight the training data of a complementary system by considering correct and error tendencies in the base systems, which has close relationship with boosting methods. In addition, the proposed method can practically construct complementary systems by simply extending a lattice-based parameter update algorithm in discriminative training, and can adjust the degree of how much the complementary system outputs are different from base system ones. The experiments on highly noisy speech recognition (eThe 2nd CHiME challengef) show the effectiveness of the proposed method, compared with a conventional system combination approach.
Bibliographic reference. Tachioka, Yuuki / Watanabe, Shinji (2013): "Discriminative training of acoustic models for system combination", In INTERSPEECH-2013, 2355-2359.