ISCA Archive MLSLP 2012
ISCA Archive MLSLP 2012

Structured discriminative models for speech recognition

Mark Gales, Anton Ragni, Austin Zhang, Rogier van Dalen

Generative models, normally in the form of hidden Markov models, have been the dominant form of acoustic model for automatic speech recognition for more than two decades. In recent years there has been interest in applying structured discriminative models to this task. This talk discusses one particular form of discriminative model, log-linear models, and how they may be applied to continuous speech recognition tasks. Two important issues will be discussed in detail: the appropriate form of features for this model; and the training criterion to be used. Generative models are proposed to extract the features for the discriminative log-linear model. This combination of generative and discriminative models enables state-of-the-art adaptation and noise robustness approaches to be used to handle mismatches between the training and test conditions. An interesting aspect of these features is that the conditional independence assumptions of the underlying generative models are not necessarily reflected in the features that are derived from the models. Various forms of training criteria, including minimum Bayes' risk and large margin approaches, are discussed. The relationship between large-margin training of log-linear models and structured support vector machines is described. Results are presented on two noise-robustness tasks: AURORA-2 and AURORA-4.

Cite as: Gales, M., Ragni, A., Zhang, A., Dalen, R.v. (2012) Structured discriminative models for speech recognition. Proc. Machine Learning in Speech and Language Processing (MLSLP 2012)

  author={Mark Gales and Anton Ragni and Austin Zhang and Rogier van Dalen},
  title={{Structured discriminative models for speech recognition}},
  booktitle={Proc. Machine Learning in Speech and Language Processing (MLSLP 2012)}