This paper is devoted to modeling prosody of whispered Russian speech. The practical purpose of this research is to extend voice cloning techniques to whispered speech modality. The authors present their analysis of prosodic features that contribute to the expression of sentence type intonation in whispered speech. The current investigation includes intonation contours in complete and incomplete declaratives, as well as in interrogatives and exclamations. Since the fundamental frequency is absent in whisper, the major role in conveying sentence type intonation is taken over by formant values. For modeling prosody of whispered speech, an extension of the Accent Unit Portrait Model is proposed. The paper demonstrates how melodic, rhythmic and dynamic (energy) portraits of accent units can be built and employed for whispered speech modifications by a concatenative text-to-speech synthesizer.
Index Terms: whispered speech, prosody modeling, speech synthesis, accent unit portrait model, formant modification.
Cite as: Petrushin, V.A., Tsirulnik, L.I., Makarova, V. (2010) Whispered speech prosody modeling for TTS synthesis. Proc. Speech Prosody 2010, paper 288
@inproceedings{petrushin10_speechprosody, author={Valery A. Petrushin and Liliya I. Tsirulnik and Veronika Makarova}, title={{Whispered speech prosody modeling for TTS synthesis}}, year=2010, booktitle={Proc. Speech Prosody 2010}, pages={paper 288} }