Modern automatic speech recognition systems use Gaussian mixture models (GMM) on acoustic observations to model the probability of producing a given observation under any one of many hidden discrete phonetic states. This paper investigates the feasibility of using an acoustic decision tree to directly model these probabilities. Unlike the more common phonetic decision tree, which asks questions about phonetic context, an acoustic decision tree asks questions about the vector-valued observations. Three different types of acoustic questions are proposed and evaluated, including LDA, PCA, and MMI questions. Frame classification experiments are run on a subset of the Switchboard corpus. On these experiments, the acoustic decision tree produces slightly better results than maximum likelihood trained GMMs, with significantly less computation. Some theoretical advantages of the acoustic decision tree are discussed, including more economical use of the training data and reduced mismatch between the acoustic model and the true probability distribution of the phonetic labels.
Bibliographic reference. Droppo, Jasha / Seltzer, Michael L. / Acero, Alex / Chiu, Yu-Hsiang Bosco (2008): "Towards a non-parametric acoustic model: an acoustic decision tree for observation probability calculation", In INTERSPEECH-2008, 289-292.