11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Probabilistic State Clustering Using Conditional Random Field for Context-Dependent Acoustic Modelling

Khe Chai Sim

National University of Singapore, Singapore

Hidden Markov Models are widely used in speech recognition systems. Due to the co-articulation effects of continuous speech, context-dependent models have been found to yield performance improvements. One major issue with context-dependent acoustic modelling is the robust parameter estimation of unseen or rare models in the training data. Typically, decision tree state clustering is used to ensure that there are sufficient data for each physical state. Decision trees based on phonetic questions are used to cluster the states. In this paper, conditional random field (CRF) is used to perform probabilistic state clustering where phonetic questions are used as binary feature functions to predict the latent cluster weights. Experimental results on the Wall Street Journal reveals that CRF-based state clustering outperformed the conventional maximum likelihood decision tree state clustering with similar model complexities by about 10% relative.

Full Paper

Bibliographic reference.  Sim, Khe Chai (2010): "Probabilistic state clustering using conditional random field for context-dependent acoustic modelling", In INTERSPEECH-2010, 70-73.