Third International Conference on Spoken Language Processing (ICSLP 94)
It is well known that the introduction of acoustic background distortion and the variability resulting from environmentally induced stress causes speech recognition algorithms to fail. In this paper, several causes for recognition performance degradation are explored. It is suggested that recent studies based on a source generator framework can provide the necessary foundation to establish robust speech recognition techniques. In addition, initial results from two studies are discussed which address both environmental noise and speaker perturbation due to stress for recognition. First, a novel constrained-iterative feature-estimation algorithm is considered which is shown to produce improved speech feature characterization in a wide variety of actual noise conditions (computer fan, large crowd, and voice communication channel noise using CCDATA). Second, a neural network based processing algorithm is formulated using one of several robust input feature sets, which detects and classifies the stressed speaker state. It is shown that a successful stress classification rate of 80.6% is possible when stress conditions are combined into groups of related domains. It is suggested that such knowledge could be used to monitor speaker state and direct feature estimation for improved robustness of speech recognizers. Further discussion of the overall source generator framework which models perturbation from vocal tract excitation to the environment is discussed.
Bibliographic reference. Hansen, John H. L. / Womack, Brian D. / Arslan, Levent M. (1994): "A source generator based production model for environmental robustness in speech recognition", In ICSLP-1994, 1003-1006.