11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Lecture Speech Recognition by Combining Word Graphs of Various Acoustic Models

Tetsuo Kosaka, Keisuke Goto, Takashi Ito, Masaharu Kato

Yamagata University, Japan

The aim of this work is to improve the performance of lecture speech recognition by using a system combination approach. In this paper, we propose a new combination technique in which various types of acoustic models are combined. In the combination approach, the use of complementary information is important. In order to prepare acoustic models that incorporate a variety of acoustic features, we employ both continuous-mixture hidden Markov models (CMHMMs) and discrete-mixture hidden Markov models (DMHMMs). These models have different patterns of recognition errors. In addition, we propose a new maximum mutual information (MMI) estimation of the DMHMM parameters. In order to evaluate the performance of the proposed method, we conduct recognition experiments on "Corpus of Spontaneous Japanese." In the experiments, a combination of CMHMMs and DMHMMs whose parameters were estimated by using the MMI criterion exhibited the best recognition performance.

Full Paper

Bibliographic reference.  Kosaka, Tetsuo / Goto, Keisuke / Ito, Takashi / Kato, Masaharu (2010): "Lecture speech recognition by combining word graphs of various acoustic models", In INTERSPEECH-2010, 2978-2981.