The aim of this work is to improve the performance of lecture speech recognition by using a system combination approach. In this paper, we propose a new combination technique in which various types of acoustic models are combined. In the combination approach, the use of complementary information is important. In order to prepare acoustic models that incorporate a variety of acoustic features, we employ both continuous-mixture hidden Markov models (CMHMMs) and discrete-mixture hidden Markov models (DMHMMs). These models have different patterns of recognition errors. In addition, we propose a new maximum mutual information (MMI) estimation of the DMHMM parameters. In order to evaluate the performance of the proposed method, we conduct recognition experiments on "Corpus of Spontaneous Japanese." In the experiments, a combination of CMHMMs and DMHMMs whose parameters were estimated by using the MMI criterion exhibited the best recognition performance.
Bibliographic reference. Kosaka, Tetsuo / Goto, Keisuke / Ito, Takashi / Kato, Masaharu (2010): "Lecture speech recognition by combining word graphs of various acoustic models", In INTERSPEECH-2010, 2978-2981.