Sixth European Conference on Speech Communication and Technology
This paper explores the use of micro-prosody in improving the quality of synthesised speech in concatenated text to speech synthesis (TTS) systems. Micro-prosody are defined as prosodic signals within context-dependent triphone units and across neighbouring triphones. Micro-prosody parameters are modelled using a Markovian model whose state distributions depend on the current linguistic-prosodic state as well as the current and the neighbouring phones. The use of various speech unit selection criteria in the design of the TTS sound inventory and their effects in reducing the variance of micro-prosodic parameters in concatenated speech and on the TTS output speech are explored. The effect of the variability of the prosodic parameters of speech in the recorded samples from a given speaker, and the influence of accents, such as the US and the UK accented English, on speech prosody variability and on the design of TTS are considered.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Chen, Aimin / Wong, Shu Lian / Vaseghi, Saeed / Ho, Charles (1999): "Decision tree micro-prosody structures for text to speech synthesis", In EUROSPEECH'99, 1615-1618.