Sixth International Conference on Spoken Language Processing
The use of context-dependent phonetic units together with Gaussian mixture models allows modern-day speech recognizer to build very complex and accurate acoustic models. However, because of data sparseness issue, some sharing of data across different triphone states is necessary. The acoustic model design is typically done in two stages, namely, designing the state-tying map and growing the number of mixtures in each tied-state. In the design of the state-tying map, single Gaussians are used to represent the data, ignoring the fact that a single Gaussian is an insuĂcient model. In this paper, we propose a simple modification to the two-stage process by adding a third stage. In this added stage, the state-tying tree is pruned and the pruning is based on the mixture representation of the tied-states. We propose using Bayesian Information Criterion(BIC) as the criterion for this pruning and show that by adding this step, the resulting model is more compact and gives better recognition accuracy on the Resource Management(RM) task.
Bibliographic reference. Chan, Yu-Chung / Siu, Manhung / Mak, Brian (2000): "Pruning of state-tying tree using bayesian information criterion with multiple mixtures", In ICSLP-2000, vol.4, 294-297.