Speech signals are subject to variations resulting from linguistic variables as well as such acoustic variables as additive noise, speaker individuality, environment-dependent speaking styles, and microphone and transmission characteristics. This paper overviews the main methods that have been investigated to cope with these variations, which are the major factors degrading the performance of speech recognition systems used in practical situations.
The following methods have been used to deal with additive noises: using special microphones, using auditory models for speech analysis and feature extraction, reducing and suppressing noise, using noise masking and adaptive models, using spectral distance measures that are robust against noises, and compensating for spectral deviation resulting from the special speaking manners used in noisy environments (Lombard effect). Various methods have also been used to cope with the problems caused by the different characteristics of different kinds of microphones.
Discourse recognition using spontaneous speech has recently occupied the attention of many researchers. In this area, it is necessary to cope with variations that are not encountered when recognizing speech read from a text. Various approaches to giving recognition systems the ability to automatically adapt to individual speakers have also been actively explored. To cope with the variation related to linguistic processing, methods of adaptation to a new task have been investigated.
Cite as: Furui, S. (1992) Toward robust speech recognition under adverse conditions. Proc. ETRW on Speech Processing in Adverse Conditions, 31-42
@inproceedings{furui92_spac, author={Sadaoki Furui}, title={{Toward robust speech recognition under adverse conditions}}, year=1992, booktitle={Proc. ETRW on Speech Processing in Adverse Conditions}, pages={31--42} }