9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Towards a Non-Parametric Acoustic Model: An Acoustic Decision Tree for Observation Probability Calculation

Jasha Droppo (1), Michael L. Seltzer (1), Alex Acero (1), Yu-Hsiang Bosco Chiu (2)

(1) Microsoft Research, USA; (2) Carnegie Mellon University, USA

Modern automatic speech recognition systems use Gaussian mixture models (GMM) on acoustic observations to model the probability of producing a given observation under any one of many hidden discrete phonetic states. This paper investigates the feasibility of using an acoustic decision tree to directly model these probabilities. Unlike the more common phonetic decision tree, which asks questions about phonetic context, an acoustic decision tree asks questions about the vector-valued observations. Three different types of acoustic questions are proposed and evaluated, including LDA, PCA, and MMI questions. Frame classification experiments are run on a subset of the Switchboard corpus. On these experiments, the acoustic decision tree produces slightly better results than maximum likelihood trained GMMs, with significantly less computation. Some theoretical advantages of the acoustic decision tree are discussed, including more economical use of the training data and reduced mismatch between the acoustic model and the true probability distribution of the phonetic labels.

Full Paper

Bibliographic reference.  Droppo, Jasha / Seltzer, Michael L. / Acero, Alex / Chiu, Yu-Hsiang Bosco (2008): "Towards a non-parametric acoustic model: an acoustic decision tree for observation probability calculation", In INTERSPEECH-2008, 289-292.