Acoustic models for speech recognition are often trained on data coming from a variety of sources. The usual approach is to pool together all of the available training data, considering them all to be part of a unique training set. In this work, assuming that each source may have a different degree of relevance for a given target task, two techniques are proposed to weigh subsets of the training data. The first one is based on the interpolation of the model probability densities, while the other on data weighting. An method to automatically select the mixture coefficients is also proposed. The best technique presented here outperformed unsupervised MAP adaptation and led to improvements in word accuracy (up to 6% relative) over the pooled model.
Bibliographic reference. Fraga-Silva, Thiago / Gauvain, Jean-Luc / Lamel, Lori (2013): "Interpolation of acoustic models for speech recognition", In INTERSPEECH-2013, 3347-3351.