7th International Conference on Spoken Language Processing
September 16-20, 2002
The paper presents an automatic method for devising the question sets used for the induction of classification and regression trees. The algorithm employed is the well-known mutual information based bottom-up clustering applied to phone bigram statistics. The sets of phones at the nodes in the resulting binary tree are used as question sets for clustering context-sensitive (tri-phone) HMM output distributions in a large vocabulary speech recognizer. The algorithm is shown to perform as well and sometimes significantly better than question sets devised by human experts for a Spanish and German system evaluated on several tasks, respectively. It eliminates the need for linguistic expertise and it provides a faster solution as well.
Bibliographic reference. Chelba, Ciprian / Morton, Rachel (2002): "Mutual information phone clustering for decision tree induction", In ICSLP-2002, 1005-1008.