EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

Modeling Speaking Rate for Voice Fonts

Ashish Verma, Arun Kumar

Indian Institute of Technology, India

Voice fonts are created and stored for a speaker, to be used to synthesize speech in the speaker's voice. The most important descriptors of voice fonts are spectral envelope for acoustic units and prosodic features such as fundamental frequency and average speaking rate. In this paper, we present a new approach to model the speaking rate so that it can be easily incorporated in voice fonts and used for personality transformation. We model speaking rate in the form of average duration for various acoustic units and categories for the speaker. The speaking rate can be automatically extracted from a speech corpus in the speaker's voice using the proposed approach. We show how the proposed approach can be implemented, and present its performance evaluation through various subjective tests.

Full Paper

Bibliographic reference.  Verma, Ashish / Kumar, Arun (2003): "Modeling speaking rate for voice fonts", In EUROSPEECH-2003, 2917-2920.