Fifth ISCA ITRW on Speech Synthesis

June 14-16, 2004
Pittsburgh, PA, USA

Duration Modeling of Indian Languages Hindi and Telugu

N. Sridhar Krishna, Hema A. Murthy

Department of Computer Science and Engineering, Indian Institute of Technology, Madras, India

This paper reports a preliminary attempt on data-driven modeling of segmental (phoneme) duration for two Indian languages Hindi and Telugu. Classification and Regression Tree (CART) based data-driven duration modeling for segmental duration prediction is presented. A number of features are proposed and their usefulness and relative contribution in segmental duration prediction is assessed. Objective evaluation of the duration models, by root mean squared prediction error (RMSE) and correlation between actual and predicted durations, is performed. The duration models developed have been implemented in an Indian language Text-to-Speech synthesis system [1] being developed within Festival framework [2].

References

  1. Sridhar Krishna, N., Hema A. Murthy, Timothy A. Gonsalves, "Text-to-Speech in Indian Languages.," in International Conference on Natural Language Processing, Mumbai, India, 2002, pp. 317-326.
  2. Black, A.W., Paul Taylor, and Richard Caley, The Festival Speech Synthesis System: Manual and source code available at, http://www.cstr.ed.ac.uk/projects/festival.html, CSTR web page.

Full Paper

Bibliographic reference.  Krishna, N. Sridhar / Murthy, Hema A. (2004): "Duration modeling of Indian languages Hindi and Telugu", In SSW5-2004, 197-202.