Sixth European Conference on Speech Communication and Technology
State-shared, context-dependent, acoustic HMM's are the basis of practically all large-vocabulary state-of-the-art speech recognition systems. The topology, i.e. state-sharing, is usually trained by decision tree based clustering of similar phonetic contexts, i.e. divisive clustering on the state level. In this paper, we show that Phonetic Decision Trees (PDT) and Maximum Likelihood Successive State Splitting (ML-SSS) can be regarded as variants of the same fundamental partitioning algorithm: the main difference being that in ML-SSS all possible phoneme combination sets are allowed, whereas in PDT the possible phoneme combination sets are limited based on phonological information that has been decided a-priori and heuristically. A combination of PDT and ML-SSS outperformed both PDT and ML-SSS on a non-read Japanese speech recognition task. To solve the problem of unseen contexts occurring in ML-SSS, the Split History Backoff algorithm is introduced.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Singer, Harald / Nakamura, Atsushi (1999): "Unified framework for acoustic topology modelling: ML-SSS and question-based decision trees", In EUROSPEECH'99, 1355-1358.