12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Rapid Adaptation of Foreign-Accented HMM-Based Speech Synthesis

Reima Karhila (1), Mirjam Wester (2)

(1) Aalto University, Finland
(2) University of Edinburgh, UK

This paper presents findings of listeners' perception of speaker identity in synthetic speech. Specifically, we investigated what the effect is on the perceived identity of a speaker when using differently accented average voice models and limited amounts (five and fifteen sentences) of a speaker's data to create the synthetic stimuli. A speaker discrimination task was used to measure speaker identity. Native English listeners were presented with natural and synthetic speech stimuli in English and were asked to decide whether they thought the sentences were spoken by the same person or not. An accent rating task was also carried out to measure the perceived accents of the synthetic speech stimuli. The results show that listeners, for the most part, perform as well at speaker discrimination when the stimuli have been created using five or fifteen adaptation sentences as when using 105 sentences. Furthermore, the accent of the average voice model does not affect listeners' speaker discrimination performance even though the accent rating task shows listeners are perceiving different accents in the synthetic stimuli. Listeners do not base their speaker similarity decisions on perceived accent.

Full Paper

Bibliographic reference.  Karhila, Reima / Wester, Mirjam (2011): "Rapid adaptation of foreign-accented HMM-based speech synthesis", In INTERSPEECH-2011, 2801-2804.