5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

High Resolution Prosody Modification for Speech Synthesis

Francisco M. Gimenez de los Galanes, David Talkin

Entropic Research Laboratory, Washington, DC. USA

In this paper we will introduce RTIPS, a system for arbitrary high-resolution modification of the prosodic variables of speech: fundamental frequency, rhythm (segmental duration) and intensity. It is based on the Resample and ovelap-add (R-OLA) algorithm for fundamental frequency and duration modification of speech. The algorithm works pitch-synchronously in order to accurately modify the pitch contour, and it uses estimates of the glottal closure instants (epochs) as the synchronism marks. This technique is very similar to other OLA-based methods for time or pitch modification, but because of the introduction of the resampling step, voice quality (especially for high-pitched voices) is much more natural after resynthesis, at any given output sampling frequency. The reliability of the R-OLA algorithm is highly depen- dent on the accuracy of the method used for epoch detection, so this preprocessing step has to be carefully designed.

