10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

New Method for Delexicalization and its Application to Prosodic Tagging for Text-to-Speech Synthesis

Martti Vainio (1), Antti Suni (1), Tuomo Raitio (2), Jani Nurminen (3), Juhani Järvikivi (4), Paavo Alku (2)

(1) University of Helsinki, Finland
(2) Helsinki University of Technology, Finland
(3) Nokia Devices R&D, Finland
(4) Max Planck Institute for Psycholinguistics, The Netherlands

This paper describes a new flexible delexicalization method based on glottal excited parametric speech synthesis scheme. The system utilizes inverse filtered glottal flow and all-pole modelling of the vocal tract. The method provides a possibility to retain and manipulate all relevant prosodic features of any kind of speech. Most importantly, the features include voice quality, which has not been properly modeled in earlier delexicalization methods. The functionality of the new method was tested in a prosodic tagging experiment aimed at providing word prominence data for a text-to-speech synthesis system. The experiment confirmed the usefulness of the method and further corroborated earlier evidence that linguistic factors influence the perception of prosodic prominence.

