EUROSPEECH 2003 - INTERSPEECH 2003
The long-term purpose of this study is to determine whether there are "visual" cues to prosody. An audiovisual corpus was recorded from a male native French speaker. The sentences had a subject-verb-object (SVO) syntactic structure. Four conditions were studied: focus on each phrase (S,V,O) and no focus. Normal and reiterant modes were recorded. We first measured F0, duration and intensity to validate the corpus. The pitch maximum over the utterance was generally on a focused syllable and duration and intensity were higher for the focused syllables. Then lip aperture and jaw opening were extracted from the video. The jaw opening maximum generally fell on one of the focused syllables, but peak velocity was more consistently correlated with focus. Moreover, lip closure duration was longer for the first segment of the focused phrase. We can therefore assume that there are visual aspects in prosody that may be used in communication.
Bibliographic reference. Dohen, Marion / Loevenbruck, Hélène / Cathiard, Marie-Agnes / Schwartz, Jean-Luc (2003): "Potential audiovisual correlates of contrastive focus in French", In EUROSPEECH-2003, 145-148.