INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Audio-Visual Prosody of Social Attitudes in Vietnamese: Building and Evaluating a Tones Balanced Corpus

Dang-Khoa Mac (1), Véronique Aubergé (1), Albert Rilliard (2), Eric Castelli (3)

(1) LIG, France
(2) LIMSI, France
(3) MICA, Vietnam

This paper presents the building and a first evaluation of a tones balanced Audio-Visual corpus of social affect in Vietnamese language. This under-resourced tonal language has specific glottalization and co-articulation phenomena, for which interactions with attitudes prosody are a very interesting issue. A well-controlled recording methodology was designed to build a large representative audio-visual corpus for 16 attitudes, and one speaker. A perception experiment was carried out to evaluate a speakerís perceived performances and to study the role and integration of the audio, visual, and audio-visual information in the listenerís perception of the speakerís attitudes. The results reveal characteristics of Vietnamese prosodic attitudes and allow us to investigate such social affect in Vietnamese language.

Full Paper

Bibliographic reference.  Mac, Dang-Khoa / Aubergé, Véronique / Rilliard, Albert / Castelli, Eric (2009): "Audio-visual prosody of social attitudes in vietnamese: building and evaluating a tones balanced corpus", In INTERSPEECH-2009, 2263-2266.