8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Combining Agglomerative and Tree-Based State Clustering for High Accuracy Acoustic Modeling

Zhaobing Han, Shuwu Zhang, Bo Xu

Chinese Academy of Sciences, China

Robust estimate of a large number of parameters against the availability of training data is a crucial issue in triphone based continuous speech recognition. To cope with the issue, two major context-clustering methods, agglomerative (AGG) and tree-based (TB), have been widely studied. In this paper, we analyze two algorithms with respect to their advantages and disadvantages and introduce a novel combined method that takes advantage of each method to cluster and tie similar acoustic states for highly detailed acoustic models. In addition, we devise a two-level clustering approach for TB, which uses the tree-based state tying for rare acoustic phonetic events twice. For LVCSR, the experimental results showed the performance could be highly improved by using the proposed combined method, compared with those of using the popular TB method alone.

Full Paper

Bibliographic reference.  Han, Zhaobing / Zhang, Shuwu / Xu, Bo (2004): "Combining agglomerative and tree-based state clustering for high accuracy acoustic modeling", In INTERSPEECH-2004, 393-396.