![]() |
EUROSPEECH 2001 Scandinavia
|
![]() |
An English utterance was synthesized in four versions using sets of diphones produced under four different prosodic and contextual conditions. The synthesis used either accented di-phones only or appropriately located accented and unaccented diphones, with each of these conditions being repeated using neutral-context and differentiated-context diphones. They were presented to two listener groups, a native English and a non-native group for paired comparison acceptability judgements. The results show a massive preference for the stress- and context-differentiated condition. Both stress and context had a significant effect on acceptability judgements, but context-differentiation raised acceptability more strongly than stress-differentiation. Both the native and the main sub-group of non-native listeners judged the stimuli in essentially the same way.
Bibliographic reference. Barry, William / Nielsen, Claus / Andersen, Ove (2001): "Must diphone synthesis be so unnatural?", In EUROSPEECH-2001, 975-978.