ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Must diphone synthesis be so unnatural?

William Barry, Claus Nielsen, Ove Andersen

An English utterance was synthesized in four versions using sets of diphones produced under four different prosodic and contextual conditions. The synthesis used either accented di-phones only or appropriately located accented and unaccented diphones, with each of these conditions being repeated using neutral-context and differentiated-context diphones. They were presented to two listener groups, a native English and a non-native group for paired comparison acceptability judgements. The results show a massive preference for the stress- and context-differentiated condition. Both stress and context had a significant effect on acceptability judgements, but context-differentiation raised acceptability more strongly than stress-differentiation. Both the native and the main sub-group of non-native listeners judged the stimuli in essentially the same way.


doi: 10.21437/Eurospeech.2001-259

Cite as: Barry, W., Nielsen, C., Andersen, O. (2001) Must diphone synthesis be so unnatural? Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 975-978, doi: 10.21437/Eurospeech.2001-259

@inproceedings{barry01_eurospeech,
  author={William Barry and Claus Nielsen and Ove Andersen},
  title={{Must diphone synthesis be so unnatural?}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={975--978},
  doi={10.21437/Eurospeech.2001-259}
}