ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition

April 13-16, 2003
Tokyo Institute of Technology, Tokyo, Japan

Mental State Detection of Dialogue System Users Via Spoken Language

Tong Zhang, Mark Hasegawa-Johnson, Stephen E. Levinson

Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, IL, USA

This paper presents an approach to simulate the mental activities of children during their interaction with computers through their spoken language. The mental activities are categorized into three states: confidence, confusion and frustration. Two knowledge sources are used in the detection. One is prosody, which indicates utterance type and userís attitude. The other is embedded key words/phrases which help interpret the utterances. Moreover, it is found that childrenís speech exhibits very different acoustic characteristics from adults. Given the uniqueness of childrenís speech, this paper applies a vocal-tract-length-normalization (VTLN)-based technique to compensate for both inter-speaker variability and intraspeaker variability in childrenís speech. The detected key words/phrases are then integrated with prosodic information as the cues for the MAP decision of mental states. Tests on a set of 50 utterances collected from the project experiment showed the classification accuracy was 74%.


Full Paper

Bibliographic reference.  Zhang, Tong / Hasegawa-Johnson, Mark / Levinson, Stephen E. (2003): "Mental state detection of dialogue system users via spoken language", in SSPR-2003, paper MAP17.