Perception of prosodic transformation for Japanese social affects

Dominique Fourer, Takaaki Shochi, Jean-Luc Rouas, Albert Rilliard

This paper is about the perception of 'genuine' social affects versus 'synthetic' ones. Our ultimate aim is to create a software for self-teaching language learning that includes a tool where learners will be able to hear their own voice producing the social affect correctly. Towards this goal, we study here how we can construct synthetic stimuli using neutral voices and prosodic parameters, and if such stimuli can be well enough recognized by native listeners. At first, we explain how our corpus is build around contextual scenarios and the recording protocol. Then, we explain how the synthetic stimuli are constructed. These stimuli must comply with several constraints: keeping the original speaker identity, preserving the linguistic content, and of course having the best possible quality. Results from a perception experiment with native speakers of Japanese show that the social affects for natural stimuli are quite well recognized although the results show more variation on the synthetic stimuli, depending on the considered social affect. Some social affects may indeed be expressed quite subtly so that they are difficult to synthesize. An investigation based on statistical analysis is proposed showing where the main difficulties lie.

DOI: 10.21437/SpeechProsody.2016-203

