ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Automatic labelling of voice-quality in speech databases for synthesis

Nick Campbell, Toru Marumoto

A series of experiments was performed to determine the extent to which voice-quality differences could be labelled automatically in a speech database. Using speech corpora of three different speaking styles from the same speaker as test material, hidden-Markov models were trained to distinguish the prosodic and acoustic characteristics of each style, and were used to re-label the voiced-segments in order to provide a single, merged, labelled corpus. Perceptual tests of speech synthesised by concatenation using CHATR showed that both prosodic and voice-quality cues to stylistic variation (in this case emotion) can be detected and labelled by the trained models. However, speech synthesised from the original separate databases was perceived as being more expressive.


Cite as: Campbell, N., Marumoto, T. (2000) Automatic labelling of voice-quality in speech databases for synthesis. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 468-471

@inproceedings{campbell00b_icslp,
  author={Nick Campbell and Toru Marumoto},
  title={{Automatic labelling of voice-quality in speech databases for synthesis}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 4, 468-471}
}