Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Stochastic Perceptual Auditory-Event-Based Models for Speech Recognition

Nelson Morgan (1,2), Herve Bourlard (1), Steven Greenberg (1,2), Hynek Hermansky (1,3)

(1) International Computer Science Institute (ICSI), Berkeley, CA, USA
(2) U. of California, Berkeley, CA, USA
(3) Oregon Graduate Institute, Portland, OR, USA

We have developed a statistical model of speech that incorporates certain temporal properties of human speech perception. The primary goal of this work is to avoid a number of current constraining assumptions for statistical speech recognition systems, particularly the model of speech as a sequence of stationary segments consisting of uncorrelated acoustic vectors. A focus on perceptual models may in principle allow for statistical modeling of speech components that are more relevant for discrimination between candidate utterances during speech recognition. In particular, we hope to develop systems that have some of the robust properties of human audition for speech collected under adverse conditions. The outline of this new research direction is given here, along with some preliminary theoretical work.

Full Paper

Bibliographic reference.  Morgan, Nelson / Bourlard, Herve / Greenberg, Steven / Hermansky, Hynek (1994): "Stochastic perceptual auditory-event-based models for speech recognition", In ICSLP-1994, 1943-1946.