In this paper, we experiment with decision trees as replacements for Gaussian mixture models to compute the observation likelihoods for a given HMM state in a speech recognition system. Decision trees have a number of advantageous properties, such as that they do not impose restrictions on the number or types of features, and that they automatically perform feature selection. In fact, due to the conditional nature of the decision tree evaluation process, the subset of features that is actually used during recognition depends on the input signal. Automatic state-tying can be incorporated directly into the acoustic model as well, and it too becomes a function of the input signal. Experimental results for the Aurora 2 speech database show that a system using decision trees offers state-of-the-art performance, even without taking advantage of its full potential.
Bibliographic reference. Teunen, Remco / Akamine, Masami (2007): "HMM-based speech recognition using decision trees instead of GMMs", In INTERSPEECH-2007, 2097-2100.