5th International Conference on Spoken Language Processing
Many stressful environments can deteriorate the performance of speech recognition systems such as aircraft cockpits or high workload task stress/emotional situations. To address this, we investigate a number of linear and nonlinear features and processing methods for stressed speech classification. The linear features include properties of pitch, duration, intensity, glottal source, and the vocal tract spectrum. Nonlinear processing is based on our newly proposed Teager Energy Operator speech feature which incorporates frequency domain critical band filters and properties of the resulting TEO autocorrelation envelope. In this study, we employ a Bayesian hypothesis testing and a hidden Markov model processor as classification methods. Evaluations focused on speech under loud, angry, and the Lombard effect from the SUSAS database. Results using ROC curves and EER based detection show that pitch is the best of the five linear features for stress classification; while the new nonlinear TEO-based feature outperforms the best linear feature by +5.2%, with a reduction in classification rate variability from 8.66 to 3.90.
Bibliographic reference. Zhou, Guojun / Hansen, John H. L. / Kaiser, James F. (1998): "Linear and nonlinear speech feature analysis for stress classification", In ICSLP-1998, paper 0840.