ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

Prosodic vs. segmental contributions to naturalness in a diphone synthesizer

H. Timothy Bunnell, Steve R. Hoskins, Debra Yarrington

The relative contributions of segmental versus prosodic factors to the perceived naturalness of synthetic speech was measured by transplanting prosody between natural speech and the output of a diphone synthesizer. A small corpus was created containing matched sentence pairs wherein one member of the pair was a natural utterance and the other was a synthetic utterance generated with diphone data from the same talker. Two additional sentences were formed from each sentence pair by transplanting the prosodic structure between the natural and synthetic members of each pair. In two listening experiments subjects were asked to (a) classify each sentence as "natural" or "synthetic, or (b) rate the naturalness of each sentence. Results showed that the prosodic information was more important than segmental information in both classification and ratings of naturalness.


doi: 10.21437/ICSLP.1998-15

Cite as: Bunnell, H.T., Hoskins, S.R., Yarrington, D. (1998) Prosodic vs. segmental contributions to naturalness in a diphone synthesizer. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0857, doi: 10.21437/ICSLP.1998-15

@inproceedings{bunnell98_icslp,
  author={H. Timothy Bunnell and Steve R. Hoskins and Debra Yarrington},
  title={{Prosodic vs. segmental contributions to naturalness in a diphone synthesizer}},
  year=1998,
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 0857},
  doi={10.21437/ICSLP.1998-15}
}