Although Gaussian mixture models are commonly used in acoustic models for speech recognition, there is no standard method for determining the number of mixture components. Most models arbitrarily assign the number of mixture components with little justification. While model selection techniques with a mathematical derivation, such as the Bayesian information criterion (BIC), have been applied, these criteria focus on properly modeling the true distribution of individual tied-states (senones) without considering the entire acoustic model; this leads to suboptimal speech recognition performance. In this paper we present a method to generate statistically-justified acoustic models that consider inter-senone effects by modifying the BIC. Experimental results in the CMU Communicator domain show that in contrast to previous strategies, the new method generates not only attractively smaller acoustic models, but also ones with lower word error rate.
Bibliographic reference. Yu, Kai / Rutenbar, Rob A. (2007): "Generating small, accurate acoustic models with a modified Bayesian information criterion", In INTERSPEECH-2007, 2109-2112.