9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Detection of Speech Under Physical Stress: Model Development, Sensor Selection, and Feature Fusion

Sanjay A. Patil, John H. L. Hansen

University of Texas at Dallas, USA

Speech system scenarios can require the user to perform tasks which exert limitations on his speech production/physiology thereby causing speaker variability and reduced speech system performance. This is speech under stress, which represents a speech different from speech under neutral conditions. The stress can be physical, cognitive or noise induced (Lombard). In this study, the focus is on physical stress, with specific emphasis on: (i) number of speakers used for modeling, (ii) alternative audio sensors, and (iii) fusion based stress detection using a new audio corpus (UT-Scope). We used a GMM framework with our previously formulated TEO-CB-AutoEnv features for neutral/physical stress detection. Second, stress detection performance is investigated for both acoustic and non-acoustic (P-MIC) sensors. Evaluations show that effective stress models can be obtained with 12 speakers out of a random size of 1.42 subjects, with stress detection performance of 62.96% (for close-talking mic) and 66.36% (for P-MIC) respectively. The TEO-CB-AutoEnv model scores were fused with traditional MFCC based stress model scores using the Adaboost algorithm, resulting in an improvement in overall system performance of 9.43% (absolute, for close-talking mic) and 12.99% (absolute, for PMIC) respectively. These three advances allow for effective stress detection algorithm development with fewer training speakers and/or alternative sensors in combined feature domains.

Full Paper

Bibliographic reference.  Patil, Sanjay A. / Hansen, John H. L. (2008): "Detection of speech under physical stress: model development, sensor selection, and feature fusion", In INTERSPEECH-2008, 817-820.