INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Nearly Perfect Detection of Continuous F_0 Contour and Frame Classification for TTS Synthesis

Thomas Ewender, Sarah Hoffmann, Beat Pfister

ETH Zürich, Switzerland

We present a new method for the estimation of a continuous fundamental frequency (F0) contour. The algorithm implements a global optimization and yields virtually error-free F0 contours for high quality speech signals. Such F0 contours are subsequently used to extract a continuous fundamental wave. Some local properties of this wave, together with a number of other speech features allow to classify the frames of a speech signal into five classes: voiced, unvoiced, mixed, irregularly glottalized and silence. The presented F0 detection and frame classification can be applied to F0 modeling and prosodic modification of speech segments in high-quality concatenative speech synthesis.

Full Paper     Multimedia Files

Bibliographic reference.  Ewender, Thomas / Hoffmann, Sarah / Pfister, Beat (2009): "Nearly perfect detection of continuous f_0 contour and frame classification for TTS synthesis", In INTERSPEECH-2009, 100-103.