Talkers can express different meanings or emotions without changing what is said by changing how it is said (by using both auditory and/or visual speech cues). Typically, cue strength differs between the auditory and visual channels: linguistic prosody (expression) is clearest in audition; emotional prosody is clearest visually. We investigated how well perceivers can match auditory and visual linguistic and emotional prosodic signals. Previous research showed that perceivers can match linguistic visual and auditory prosody reasonably well. The current study extended this by also testing how well auditory and visual spoken emotion expressions could be matched. Participants were presented a pair of sentences (consisting of the same segmental content) spoken by the same talker and were required to decide whether the pair had the same prosody. Twenty sentences were tested with two types of prosody (emotional vs. linguistic), two talkers, and four matching conditions: auditory-auditory (AA); visual-visual (VV); auditory-visual (AV); and visual-auditory (VA). Linguistic prosody was accurately matched in all conditions. Matching emotional expressions was excellent for VV, poorer for VA, and near chance for AA and AV presentations. These differences are discussed in terms of the relationship between types of auditory and visual cues and task effects.
Bibliographic reference. Simonetti, Simone / Kim, Jeesun / Davis, Chris (2015): "Cross-modality matching of linguistic and emotional prosody", In INTERSPEECH-2015, 56-59.