Recent research on human intelligence  suggests that the auditory system has a hierarchical structure, in which the lower levels store individual properties, and the upper levels store the group properties of utterances. However, most of the conventional automatic recognizers adopt a single level model structure. In structure-based models, such as HMM and parametric trajectory models, only the group properties of utterances are modeled. In template-based models, only the individual properties of utterances are exploited. In this paper, we propose a novel hierarchical acoustic model to simulate the human auditory hierarchy, in which both the group and the individual properties of utterances can be explicitly addressed. Furthermore, we developed two evaluation methods, namely bottom-up and top-down test, to simulate the prediction-verification loops in human hearing. The model is evaluated on a TIMIT vowel classification task. The proposed hierarchical model significantly outperforms parametric trajectory models.
Bibliographic reference. Han, Yan / Boves, Lou (2007): "Hierarchical acoustic modeling based on random-effects regression for automatic speech recognition", In INTERSPEECH-2007, 878-881.