4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
In this paper, a mathematical framework for learning the acoustic features from a central auditory representation is presented. We adopt a statistical approach that models the learning process as to achieve a maximum likelihood estimation of the signal distribution. An algorithm, called statistical matching pursuit (SMP), is introduced to identify regions on the cortical surface where the features for each sound class are most prominent. We model the features with distributions of Gaussian mixture densities, and employ the expectation-maximization (EM) procedure to both improve the parameterization and refine iteratively the selection of cortical regions from which the features are extracted. The learning algorithm is applied to vowel classification on TIMIT database where all the vowels (excluding diphthongs, nine in total) are regarded as individual classes. Experimental results show that models trained under SMP/EM algorithm achieve a comparable recognition accuracy to that of conventional recognizers.
Bibliographic reference. Wang, Kuansan / Lee, Chin-Hui / Juang, Biing-Hwang (1996): "Maximum likelihood learning of auditory feature maps for stationary vowels", In ICSLP-1996, 1265-1268.