4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Maximum Likelihood Learning of Auditory Feature Maps for Stationary Vowels

Kuansan Wang, Chin-Hui Lee, Biing-Hwang Juang

Speech Recognition Department, Lucent Technologies Bell Laboratories, Murray Hill, NJ, USA

In this paper, a mathematical framework for learning the acoustic features from a central auditory representation is presented. We adopt a statistical approach that models the learning process as to achieve a maximum likelihood estimation of the signal distribution. An algorithm, called statistical matching pursuit (SMP), is introduced to identify regions on the cortical surface where the features for each sound class are most prominent. We model the features with distributions of Gaussian mixture densities, and employ the expectation-maximization (EM) procedure to both improve the parameterization and refine iteratively the selection of cortical regions from which the features are extracted. The learning algorithm is applied to vowel classification on TIMIT database where all the vowels (excluding diphthongs, nine in total) are regarded as individual classes. Experimental results show that models trained under SMP/EM algorithm achieve a comparable recognition accuracy to that of conventional recognizers.

Full Paper

Bibliographic reference.  Wang, Kuansan / Lee, Chin-Hui / Juang, Biing-Hwang (1996): "Maximum likelihood learning of auditory feature maps for stationary vowels", In ICSLP-1996, 1265-1268.