ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Investigations into tandem acoustic modeling for the Aurora task

Daniel P.W. Ellis, Manuel J. Reyes Gomez

In tandem acoustic modeling, signal features are first processed by a discriminantly-trained neural network, then the outputs of this network are treated as the feature inputs to a conventional distribution-modeling Gaussian-mixture model speech recognizer. This arrangement achieves relative error rate reductions of 30% or more on the Aurora task, as well as supporting feature stream combination at the posterior level, which can eliminate more than 50% of the errors compared to the HTK baseline. In this paper, we explore a number of variations on the tandem structure: We experiment with changing the subword units used in each model (neural net and GMM), varying the data subsets used to train each model, substituting the posterior calculations in the neural net with a second GMM, and a variety of feature condition such as deltas, normalization and PCA rank reduction in the `tandem domain' i.e. between the two models.


doi: 10.21437/Eurospeech.2001-70

Cite as: Ellis, D.P.W., Gomez, M.J.R. (2001) Investigations into tandem acoustic modeling for the Aurora task. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 189-192, doi: 10.21437/Eurospeech.2001-70

@inproceedings{ellis01_eurospeech,
  author={Daniel P.W. Ellis and Manuel J. Reyes Gomez},
  title={{Investigations into tandem acoustic modeling for the Aurora task}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={189--192},
  doi={10.21437/Eurospeech.2001-70}
}