In this paper, we propose a new system combination approach for an LVCSR system using speaker-class (SC) models and a speaker adaptation technique based on these SC models. The basic concept of the SC-based system is to select speakers who are acoustically similar to a target speaker to train acoustic models. One of the major problems regarding the use of the SC model is determining the selection range of the speakers. In other words, it is difficult to determine the number of speakers that should be selected. In order to solve this problem, several SC models, which are trained by a variety of number of speakers are prepared in advance. In the recognition step, acoustically similar models are selected from the above SC models, and the scores obtained from these models are merged using a word graph combination technique. The proposed method was evaluated using the Corpus of Spontaneous Japanese (CSJ), and showed significant improvement in a lecture speech recognition task.
Bibliographic reference. Kosaka, Tetsuo / Ito, Takashi / Kato, Masaharu / Kohda, Masaki (2010): "Speaker adaptation based on system combination using speaker-class models", In INTERSPEECH-2010, 546-549.