This paper presents the building and a first evaluation of a tones balanced Audio-Visual corpus of social affect in Vietnamese language. This under-resourced tonal language has specific glottalization and co-articulation phenomena, for which interactions with attitudes prosody are a very interesting issue. A well-controlled recording methodology was designed to build a large representative audio-visual corpus for 16 attitudes, and one speaker. A perception experiment was carried out to evaluate a speakers perceived performances and to study the role and integration of the audio, visual, and audio-visual information in the listeners perception of the speakers attitudes. The results reveal characteristics of Vietnamese prosodic attitudes and allow us to investigate such social affect in Vietnamese language.
Cite as: Mac, D.-K., Aubergé, V., Rilliard, A., Castelli, E. (2009) Audio-visual prosody of social attitudes in vietnamese: building and evaluating a tones balanced corpus. Proc. Interspeech 2009, 2263-2266, doi: 10.21437/Interspeech.2009-642
@inproceedings{mac09_interspeech, author={Dang-Khoa Mac and Véronique Aubergé and Albert Rilliard and Eric Castelli}, title={{Audio-visual prosody of social attitudes in vietnamese: building and evaluating a tones balanced corpus}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={2263--2266}, doi={10.21437/Interspeech.2009-642} }