5th International Conference on Spoken Language Processing
It is well known that the performance of speech recognition algorithms degrade in the presence of adverse environments where a speaker is under stress, emotion, or Lombard effect. This study evaluates the effectiveness of traditional features in recognition of speech under stress and formulates new features which are shown to improve stressed speech recognition. The focus is on formulating robust features which are less dependent on the speaking conditions rather than applying compensation or adaptation techniques. The stressed speaking styles considered are simulated angry and loud, Lombard effect speech, and noisy actual stressed speech from the SUSAS database. In addition, this study investigates the immunity of LP and FFT power spectrum to the presence of stress. Our results show that unlike FFT's immunity to noise, the LP power spectrum is more effective than the FFT to stress as well as to a combination of a noisy and stressful environment. Two alternative frequency partitioning methods (M-MFCC, ExpoLog) are proposed and compared with traditional MFCC features for stressed speech recognition. It is shown that the alternate filterbank frequency partitions are more effective for recognition of speech under both simulated and actual stressed conditions.
Bibliographic reference. Bou-Ghazale, Sahar E. / Hansen, John H. L. (1998): "Speech feature modeling for robust stressed speech recognition", In ICSLP-1998, paper 0918.