8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Hierarchical Acoustic Modeling Based on Random-Effects Regression for Automatic Speech Recognition

Yan Han, Lou Boves

Radboud University Nijmegen, The Netherlands

Recent research on human intelligence [1] suggests that the auditory system has a hierarchical structure, in which the lower levels store individual properties, and the upper levels store the group properties of utterances. However, most of the conventional automatic recognizers adopt a single level model structure. In structure-based models, such as HMM and parametric trajectory models, only the group properties of utterances are modeled. In template-based models, only the individual properties of utterances are exploited. In this paper, we propose a novel hierarchical acoustic model to simulate the human auditory hierarchy, in which both the group and the individual properties of utterances can be explicitly addressed. Furthermore, we developed two evaluation methods, namely bottom-up and top-down test, to simulate the prediction-verification loops in human hearing. The model is evaluated on a TIMIT vowel classification task. The proposed hierarchical model significantly outperforms parametric trajectory models.

Full Paper

Bibliographic reference.  Han, Yan / Boves, Lou (2007): "Hierarchical acoustic modeling based on random-effects regression for automatic speech recognition", In INTERSPEECH-2007, 878-881.