ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition
April 13-16, 2003
This paper presents an approach to simulate the mental activities of children during their interaction with computers through their spoken language. The mental activities are categorized into three states: confidence, confusion and frustration. Two knowledge sources are used in the detection. One is prosody, which indicates utterance type and userís attitude. The other is embedded key words/phrases which help interpret the utterances. Moreover, it is found that childrenís speech exhibits very different acoustic characteristics from adults. Given the uniqueness of childrenís speech, this paper applies a vocal-tract-length-normalization (VTLN)-based technique to compensate for both inter-speaker variability and intraspeaker variability in childrenís speech. The detected key words/phrases are then integrated with prosodic information as the cues for the MAP decision of mental states. Tests on a set of 50 utterances collected from the project experiment showed the classification accuracy was 74%.
Bibliographic reference. Zhang, Tong / Hasegawa-Johnson, Mark / Levinson, Stephen E. (2003): "Mental state detection of dialogue system users via spoken language", in SSPR-2003, paper MAP17.