This paper introduces a framework for parametric speech modeling that can be used in various speech applications such as text-to-speech synthesis, voice conversion etc. In order to reduce impact of pitch variations the harmonic analysis is done in the warped time scale that is aligned with instantaneous pitch values. It is assumed that each harmonic has its own periodic excitation source that evolves in time and can be modeled as a sum of several sinusoidal components with close frequencies. The parameters of the excitation components are estimated using a modified instantaneous Prony's method. The proposed analysis/synthesis technique is compared with TANDEM-STRAIGHT.
Bibliographic reference. Azarov, Elias / Vashkevich, Maxim / Petrovsky, Alexander (2013): "Instantaneous harmonic representation of speech using multicomponent sinusoidal excitation", In INTERSPEECH-2013, 1697-1701.