Speech system scenarios can require the user to perform tasks which exert limitations on his speech production/physiology thereby causing speaker variability and reduced speech system performance. This is speech under stress, which represents a speech different from speech under neutral conditions. The stress can be physical, cognitive or noise induced (Lombard). In this study, the focus is on physical stress, with specific emphasis on: (i) number of speakers used for modeling, (ii) alternative audio sensors, and (iii) fusion based stress detection using a new audio corpus (UT-Scope). We used a GMM framework with our previously formulated TEO-CB-AutoEnv features for neutral/physical stress detection. Second, stress detection performance is investigated for both acoustic and non-acoustic (P-MIC) sensors. Evaluations show that effective stress models can be obtained with 12 speakers out of a random size of 1.42 subjects, with stress detection performance of 62.96% (for close-talking mic) and 66.36% (for P-MIC) respectively. The TEO-CB-AutoEnv model scores were fused with traditional MFCC based stress model scores using the Adaboost algorithm, resulting in an improvement in overall system performance of 9.43% (absolute, for close-talking mic) and 12.99% (absolute, for PMIC) respectively. These three advances allow for effective stress detection algorithm development with fewer training speakers and/or alternative sensors in combined feature domains.
Bibliographic reference. Patil, Sanjay A. / Hansen, John H. L. (2008): "Detection of speech under physical stress: model development, sensor selection, and feature fusion", In INTERSPEECH-2008, 817-820.