14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Instantaneous Harmonic Representation of Speech Using Multicomponent Sinusoidal Excitation

Elias Azarov, Maxim Vashkevich, Alexander Petrovsky

BSUIR, Belarus

This paper introduces a framework for parametric speech modeling that can be used in various speech applications such as text-to-speech synthesis, voice conversion etc. In order to reduce impact of pitch variations the harmonic analysis is done in the warped time scale that is aligned with instantaneous pitch values. It is assumed that each harmonic has its own periodic excitation source that evolves in time and can be modeled as a sum of several sinusoidal components with close frequencies. The parameters of the excitation components are estimated using a modified instantaneous Prony's method. The proposed analysis/synthesis technique is compared with TANDEM-STRAIGHT.

Full Paper

Bibliographic reference.  Azarov, Elias / Vashkevich, Maxim / Petrovsky, Alexander (2013): "Instantaneous harmonic representation of speech using multicomponent sinusoidal excitation", In INTERSPEECH-2013, 1697-1701.