This paper explores the use of micro-prosody in improving the quality of synthesised speech in concatenated text to speech synthesis (TTS) systems. Micro-prosody are defined as prosodic signals within context-dependent triphone units and across neighbouring triphones. Micro-prosody parameters are modelled using a Markovian model whose state distributions depend on the current linguistic-prosodic state as well as the current and the neighbouring phones. The use of various speech unit selection criteria in the design of the TTS sound inventory and their effects in reducing the variance of micro-prosodic parameters in concatenated speech and on the TTS output speech are explored. The effect of the variability of the prosodic parameters of speech in the recorded samples from a given speaker, and the influence of accents, such as the US and the UK accented English, on speech prosody variability and on the design of TTS are considered.
Cite as: Chen, A., Wong, S.L., Vaseghi, S., Ho, C. (1999) Decision tree micro-prosody structures for text to speech synthesis. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 1615-1618, doi: 10.21437/Eurospeech.1999-366
@inproceedings{chen99d_eurospeech, author={Aimin Chen and Shu Lian Wong and Saeed Vaseghi and Charles Ho}, title={{Decision tree micro-prosody structures for text to speech synthesis}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={1615--1618}, doi={10.21437/Eurospeech.1999-366} }