Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Environmental Robustness in Automatic Speech Recognition Using Physiologic Ally-Motivated Signal Processing

Yoshiaki Ohshima (1), Richard M. Stern (2)

(1) Tokyo Research Laboratory, IBM Japan, Ltd., Kanagawa, Japan
(2) Department of Electrical and Computer Engineering, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

This paper examines methods by which speech recognition systems can be made more environmentally robust by analyzing the performance of Seneff' s model of auditory periphery [7]. The purpose of the paper is threefold. First, we document the extent to which the Seneff model reduces the degradation in speech recognition accuracy caused by additive noise and/or linear filtering. Second, we examine the extent to which individual components of the nonlinear neural transduction (NT) stage of the Seneff model contribute to recognition accuracy by evaluating the recognition accuracy with individual components of the model removed from the processing. Third, we determine the extent to which the robustness provided by the Seneff model is complementary to and independent of the improvement in recognition accuracy already provided by existing successful acoustical pre-processing algorithms such as codeword-dependent cepstral normalization (CDCN) [1]. Experimental techniques are proposed in the course of investigating the above issues. The results of speech recognition experiments using CMU's SPHINX [4] system under real and simulated degradation are reported.

Full Paper

Bibliographic reference.  Ohshima, Yoshiaki / Stern, Richard M. (1994): "Environmental robustness in automatic speech recognition using physiologic ally-motivated signal processing", In ICSLP-1994, 1347-1350.