We are interested in emphasis for text to speech synthesis. In speech to speech translation, emphasising the correct words is important to convey the underlying meaning of a message. In this paper, we propose to use a generalised command-response (CR) model of intonation to generate emphasis in synthetic speech. We first analyse the differences in the model parameters between emphasised words in an acted emphasis scenario and their neutral counterpart. We investigate word level intonation modelling using simple random forest as a basis framework, to predict the parameters of the model in the specific case of emphasised word. Based on the linguistic context of the words we want to emphasise, we attempt at recovering emphasis pattern in the intonation in originally neutral synthetic speech by generating word-level model parameters with similar context. The method is presented and initial results are given, on synthetic speech.
Cite as: Honnet, P.-E., Garner, P.N. (2016) Emphasis recreation for TTS using intonation atoms. Proc. 9th ISCA Workshop on Speech Synthesis Workshop (SSW 9), 14-20, doi: 10.21437/SSW.2016-3
@inproceedings{honnet16_ssw, author={Pierre-Edouard Honnet and Philip N. Garner}, title={{Emphasis recreation for TTS using intonation atoms}}, year=2016, booktitle={Proc. 9th ISCA Workshop on Speech Synthesis Workshop (SSW 9)}, pages={14--20}, doi={10.21437/SSW.2016-3} }