Most research on detecting a speaker's cognitive state when interacting with a dialog system has been based on self-reports, or on hand-coded subjective judgments based on audio or audio-visual observations. This study examines two questions: (1) how do undesirable system responses affect people physiologically, and (2) to what extent can we predict physiological changes from the speech signal alone? To address these questions, we use a new corpus of simultaneous speech and high-quality physiological recordings in the product returns domain (the SRI BioFrustration Corpus). “Triggers” were used to frustrate users at specific times during the interaction to produce emotional responses at similar times during the experiment across participants. For each of eight return tasks per participant, we compared speaker-normalized pre-trigger (cooperative system behavior) regions to post-trigger (uncooperative system behavior) regions. Results using random forest classifiers show that changes in spectral and temporal features of speech can predict heart rate changes with an accuracy of ~70%. Implications for future research and applications are discussed.
Bibliographic reference. Tsiartas, Andreas / Kathol, Andreas / Shriberg, Elizabeth / Zambotti, Massimiliano de / Willoughby, Adrian (2015): "Prediction of heart rate changes from speech features during interaction with a misbehaving dialog system", In INTERSPEECH-2015, 3715-3719.