ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition

April 13-16, 2003
Tokyo Institute of Technology, Tokyo, Japan

A Successive State Splitting Algorithm Based on the MDL Criterion by Data-Driven and Decision Tree Clustering

Takatoshi Jitsuhiro, Tomoko Matsui, Satoshi Nakamura

Spoken Language Translation Research Laboratories, Advanced Telecommunications Research Institute International, Kyoto, Japan

We propose a new Successive State Splitting (SSS) algorithm based on the Minimum Description Length (MDL) criterion to design tied-state HMM topologies automatically. The SSS algorithm is a mechanism for creating both temporal and contextual variations based on the Maximum Likelihood (ML) criterion. However, it also needs to empirically predetermine control parameters for use as stop criteria, for example, the total number of states. We introduce the MDL criterion to the ML-SSS algorithm so that it can automatically create proper topologies without such parameters. Experimental results show that our extended algorithm can automatically stop splitting and obtain more appropriate HMM topologies than the original one. We also extend the MDL-SSS algorithm by using phonetic decision tree clustering for contextual splitting. A method using a combination of phonetic decision tree clustering and data-driven clustering can automatically obtain almost the same performance as the original method.

Full Paper

Bibliographic reference.  Jitsuhiro, Takatoshi / Matsui, Tomoko / Nakamura, Satoshi (2003): "A Successive state splitting algorithm based on the MDL criterion by data-driven and decision tree clustering", in SSPR-2003, paper MAP2.