16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Modeling Phonetic Context with Non-Random Forests for Speech Recognition

Hainan Xu, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur

Johns Hopkins University, USA

Modern speech recognition systems typically cluster triphone phonetic contexts using decision trees. In this paper we describe a way to build multiple complementary decision trees from the same data, for the purpose of system combination. We do this by jointly building the decision trees using an objective function that has an added entropy term to encourage diversity among the decision trees. After the trees are built, the systems are built in the standard way and the emission probabilities are combined during decoding. Experiments on multiple datasets show gains from the use of multiple trees, at the expense of evaluating multiple models in test time.

Full Paper

Bibliographic reference.  Xu, Hainan / Chen, Guoguo / Povey, Daniel / Khudanpur, Sanjeev (2015): "Modeling phonetic context with non-random forests for speech recognition", In INTERSPEECH-2015, 2117-2121.