AVSP 2003 - International Conference on Audio-Visual Speech Processing
September 4-7, 2003
The purpose of this study is to determine whether the visual modality is useful for the perception of prosody. An audio-visual corpus was recorded from a male native French speaker. The sentences had a subject-verb-object (SVO) syntactic structure. Four contrastive focus conditions were studied: focus on each phrase (S, V or O) and no focus. Normal and reiterant modes were recorded. We first measured fundamental frequency (F0), duration and intensity to validate the corpus. Then, lip aperture and jaw opening were extracted from the video data. The articulatory analysis enabled us to suggest a set of possible visual cues to focus. These cues are a) large jaw opening gestures and high opening velocities on all the syllables of the focused phrase; b) long initial lip closure and c) hypo-articulation (reduced jaw opening and duration) of the following phrases. A perception test to see if subjects could perceive focus through the visual modality alone was developed. It showed that a) contrastive focus was well perceived visually for reiterant speech; b) no training was necessary and c) subject focus was slightly easier to identify than the other focus conditions. We also found that the presence and salience of the visual cues enhances perception.
Bibliographic reference. Dohen, Marion / Loevenbruck, Hélène / Cathiard, Marie-Agnès / Schwartz, Jean-Luc (2003): "Audiovisual perception of contrastive focus in French", In AVSP 2003, 245-250.