5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

A Mixed-Excitation Frequency Domain Model for Time-Scale Pitch-Scale Modification of Speech

Alex Acero

Microsoft Research, USA

This paper presents a time-scale pitch-scale modification technique for concatenative speech synthesis. The method is based on a frequency domain source-filter model, where the source is modeled as a mixed excitation. This model is highly coupled with a compression scheme that result in compact acoustic inventories. When compared to the approach in the Whistler system using no mixed excitation, the new method shows improvement in voiced fricatives and over-stretched voiced sounds. In addition, it allows for spectral manipulation such as smoothing of discontinuities at unit boundaries, voice transformations or loudness equalization.

Full Paper

Bibliographic reference.  Acero, Alex (1998): "A mixed-excitation frequency domain model for time-scale pitch-scale modification of speech", In ICSLP-1998, paper 0072.