Sixth European Conference on Speech Communication and Technology
In many continuous speech recognition systems based on HMMs, decision tree-based state tying has been used for not only improving the robustness and accuracy of context dependent acoustic modeling but also synthesizing unseen models. To construct the phonetic decision tree, standard method has used just single Gaussian triphone models to cluster states. The coarse clusters generated using just single Gaussian models can lead to low accuracy acoustic modeling and result in low recognition performance of the system. In this paper, a multi-stage decision tree using both multi-mixture Gaussian models and single Gaussian models is proposed. Continuous speech recognition experiment using this approach on WSJ data showed a reduction in word error rate comparing to the standard decision tree based system.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Kim, DongHwa / Liu, Chaojun / Wu, Xintian / Yan, Yonghong (1999): "High accuracy acoustic modeling based on multi-stage decision tree", In EUROSPEECH'99, 1335-1338.