This article aims to provide a comprehensive set of acoustic model discriminative training results for the Corpus of Spontaneous Japanese (CSJ) lecture speech transcription task. Discriminative training was carried out for this task using a 100,000 word trigram for several acoustic model topologies, using both diagonal and full covariance models, and using both string-based and lattice-based training paradigms. We describe our implementation of the proposal by Macherey et al. for numerical subtraction of the reference lattice statistics from the competitor lattice statistics during lattice-based Minimum Classification Error (MCE) training. We also present results for lattice-based training that does not use such subtraction, corresponding to the well-known Maximum Mutual Information (MMI) approach. Discriminative training yielded relative reductions in Word Error Rate of up to 13%. Specific problems encountered in implementing discriminative training for this task are discussed.
Bibliographic reference. McDermott, Erik / Nakamura, Atsushi (2007): "String and lattice based discriminative training for the corpus of spontaneous Japanese lecture transcription task", In INTERSPEECH-2007, 2081-2084.