ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Expressive speech synthesis in MARY TTS using audiobook data and emotionML

Marcela Charfuelan, Ingmar Steiner

This paper describes a framework for synthesis of expressive speech based on MARY TTS and Emotion Markup Language (EmotionML). We describe the creation of expressive unit selection and HMM-based voices using audiobook data labelled according to voice styles. Audiobook data is labelled/split according to voice styles by principal component analysis (PCA) of acoustic features extracted from segmented sentences. We introduce the implementation of EmotionML in MARY TTS and explain how it is used to represent and control expressivity in terms of discrete emotions or emotion dimensions. Preliminary results on perception of different voice styles are presented.


doi: 10.21437/Interspeech.2013-395

Cite as: Charfuelan, M., Steiner, I. (2013) Expressive speech synthesis in MARY TTS using audiobook data and emotionML. Proc. Interspeech 2013, 1564-1568, doi: 10.21437/Interspeech.2013-395

@inproceedings{charfuelan13_interspeech,
  author={Marcela Charfuelan and Ingmar Steiner},
  title={{Expressive speech synthesis in MARY TTS using audiobook data and emotionML}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={1564--1568},
  doi={10.21437/Interspeech.2013-395}
}