INTERSPEECH 2004 - ICSLP
Robust estimate of a large number of parameters against the availability of training data is a crucial issue in triphone based continuous speech recognition. To cope with the issue, two major context-clustering methods, agglomerative (AGG) and tree-based (TB), have been widely studied. In this paper, we analyze two algorithms with respect to their advantages and disadvantages and introduce a novel combined method that takes advantage of each method to cluster and tie similar acoustic states for highly detailed acoustic models. In addition, we devise a two-level clustering approach for TB, which uses the tree-based state tying for rare acoustic phonetic events twice. For LVCSR, the experimental results showed the performance could be highly improved by using the proposed combined method, compared with those of using the popular TB method alone.
Bibliographic reference. Han, Zhaobing / Zhang, Shuwu / Xu, Bo (2004): "Combining agglomerative and tree-based state clustering for high accuracy acoustic modeling", In INTERSPEECH-2004, 393-396.