This paper presents the building and a first evaluation of a tones balanced Audio-Visual corpus of social affect in Vietnamese language. This under-resourced tonal language has specific glottalization and co-articulation phenomena, for which interactions with attitudes prosody are a very interesting issue. A well-controlled recording methodology was designed to build a large representative audio-visual corpus for 16 attitudes, and one speaker. A perception experiment was carried out to evaluate a speakerís perceived performances and to study the role and integration of the audio, visual, and audio-visual information in the listenerís perception of the speakerís attitudes. The results reveal characteristics of Vietnamese prosodic attitudes and allow us to investigate such social affect in Vietnamese language.
Bibliographic reference. Mac, Dang-Khoa / Aubergé, Véronique / Rilliard, Albert / Castelli, Eric (2009): "Audio-visual prosody of social attitudes in vietnamese: building and evaluating a tones balanced corpus", In INTERSPEECH-2009, 2263-2266.