An analysis of possible associations between speech recognition performance and three cognitive states that arise in dialogues mediated by a speech-to-speech machine translation system is reported. This analysis is based on a new corpus of interlingual interactions in a map task which includes precisely synchronised speech, video, and physiological data streams (blood-volume pulse, skin conductance, electroencephalogram, and eye movements). While no evidence is found that cognitive states occurring prior to utterances sent to the speech recogniser affect the speech recognition performance, the onset of cognitive states — especially frustration — is found to be clearly associated with speech recognition performance. Given this association, methods for automatic detection of these cognitive states were explored by using features of the two physiological signals, features of the speech signal, and combinations of speech and physiological features. Combined biosignals yields detection performance well above the baseline (71% accuracy) when the time window is restricted to the perceived duration of the state. Extending the window to the end of the utterance following the cognitive state yields poor detection on biosignals alone, but improves considerably when features of the speech signal are added, thus showing the potential usefulness of speech features as a biosignal.
Bibliographic reference. Akira, Hayakawa / Haider, Fasih / Cerrato, Loredana / Campbell, Nick / Luz, Saturnino (2015): "Detection of cognitive states and their correlation to speech recognition performance in speech-to-speech machine translation systems", In INTERSPEECH-2015, 2539-2543.