Phonetically conditioned prosody transplantation for TTS: 2-stage phone-level unit-selection framework

Mythri Thippareddy, M. G. Khanum Noor Fathima, D. N. Krishna, A. Sricharan, V. Ramasubramanian


We propose a framework of prosody transplantation for TTS, namely, 2-stage phone-level unit-selection, to transfer the prosody from a `target' prosody database onto a conventional TTS output unit-sequence. The framework employs 'phonetic conditioning', wherein target prosody-profiles are identified conditioned on their underlying phonetic content over variable length time-scales that tend to be as long as possible. In this 2-stage unit-selection framework, the units determined in a 1st-stage conventional unit-selection are mapped to units in a 2nd-stage prosodic-style database via a phone-level unit-selection, which retrieves units from the 2nd-stage prosody-database with associated prosody (representing the prosodic-style of the 2nd stage prosodic-database) and the selected prosody is further incorporated on to the 1st-stage units. This framework was recently proposed by us with early qualitative results indicating the viability of the approach. In this paper, we elaborate on this approach and characterize the performance of the proposed frameworks using various objective measures using prosodic ground truth, and with respect to the parameters of the system, and show the viability of the proposed approach to realize the target prosody very effectively.


DOI: 10.21437/SpeechProsody.2016-160

Cite as

Thippareddy, M., Noor Fathima, M.G.K., Krishna, D.N., Sricharan, A., Ramasubramanian, V. (2016) Phonetically conditioned prosody transplantation for TTS: 2-stage phone-level unit-selection framework. Proc. Speech Prosody 2016, 781-785.

Bibtex
@inproceedings{Thippareddy+2016,
author={Mythri Thippareddy and M. G. Khanum Noor Fathima and D. N. Krishna and A. Sricharan and V. Ramasubramanian},
title={Phonetically conditioned prosody transplantation for TTS: 2-stage phone-level unit-selection framework},
year=2016,
booktitle={Speech Prosody 2016},
doi={10.21437/SpeechProsody.2016-160},
url={http://dx.doi.org/10.21437/SpeechProsody.2016-160},
pages={781--785}
}