5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Data Driven Formant Synthesis

Jesper Högberg

Department of Speech, Music and Hearing, KTH, Stockholm, Sweden

In this study we introduce combined data driven and rule based methods to synthesise speech. The aim is to improve on the coarticulatory modelling by adapting the KTH TTS system to data from one speaker. Regression trees are trained on a manually corrected speech database to provide predictions for vowel formant frequencies. At runtime, the TTS system produces formant frequency trajectories that are derived from weighted contributions from both the rules and the regression trees. The weighting strategy allows flexible adjustment of the synthesis parameters and thus of the quality of the output speech. An informal perceptual test was conducted to compare the performance of the hybrid approach to that of the traditional rule based system. A great majority of the test subjects judged the speech output of the hybrid system to be more natural than the competing rule derived speech. The speech produced by the hybrid system was also generally preferred.

Bibliographic reference.  Högberg, Jesper (1997): "Data driven formant synthesis", In EUROSPEECH-1997, 565-568.