In current HMM/DNN speech recognition systems, the purpose of the DNN component is to estimate the posterior probabilities of tied triphone states. In most cases the distribution of these states is uneven, meaning that we have a markedly different number of training samples for the various states. This imbalance of the training data is a source of suboptimality for most machine learning algorithms, and DNNs are no exception. A straightforward solution is to re-sample the data, either by upsampling the rarer classes or by downsampling the more common classes. Here, we experiment with the so-called probabilistic sampling method that applies downsampling and upsampling at the same time. For this, it defines a new class distribution for the training data, which is a linear combination of the original and the uniform class distributions. As an extension to previous studies, we propose a new method to re-estimate the class priors, which is required to remedy the mismatch between the training and the test data distributions introduced by re-sampling. Using probabilistic sampling and the proposed modification we report 5% and 6% relative error rate reductions on the TED-LIUM and on the AMI corpora, respectively.
Cite as: Grósz, T., Gosztolya, G., Tóth, L. (2017) Training Context-Dependent DNN Acoustic Models Using Probabilistic Sampling. Proc. Interspeech 2017, 1621-1625, doi: 10.21437/Interspeech.2017-338
@inproceedings{grosz17_interspeech, author={Tamás Grósz and Gábor Gosztolya and László Tóth}, title={{Training Context-Dependent DNN Acoustic Models Using Probabilistic Sampling}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1621--1625}, doi={10.21437/Interspeech.2017-338} }