The Consistency and Stability of Acoustic and Visual Cues for Different Prosodic Attitudes

Jeesun Kim, Chris Davis


Recently it has been argued that speakers use conventionalized forms to express different prosodic attitudes [1]. We examined this by looking at across speaker consistency in the expression of auditory and visual (head and face motion) prosodic attitudes produced on multiple different occasions. Specifically, we examined acoustic and motion profiles of a female and a male speaker expressing six different prosodic attitudes for four within-session repetitions across four different sessions. We used the same acoustic features as [1] and visual prosody was assessed by examining patterns of speaker’s mouth, eyebrow and head movements. There was considerable variation in how prosody was realized across speakers, with the productions of one speaker more discriminable than the other. Within-session variation for both the acoustic and movement data was smaller than across-session variation, suggesting that short-term memory plays a role in consistency. The expression of some attitudes was less variable than others and better discrimination was found with the acoustic compared to the visual data, although certain visual features (e.g., eyebrow brow motion) provided better discrimination than others.


DOI: 10.21437/Interspeech.2016-1505

Cite as

Kim, J., Davis, C. (2016) The Consistency and Stability of Acoustic and Visual Cues for Different Prosodic Attitudes. Proc. Interspeech 2016, 57-61.

Bibtex
@inproceedings{Kim+2016,
author={Jeesun Kim and Chris Davis},
title={The Consistency and Stability of Acoustic and Visual Cues for Different Prosodic Attitudes},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1505},
url={http://dx.doi.org/10.21437/Interspeech.2016-1505},
pages={57--61}
}