ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Instantaneous harmonic representation of speech using multicomponent sinusoidal excitation

Elias Azarov, Maxim Vashkevich, Alexander Petrovsky

This paper introduces a framework for parametric speech modeling that can be used in various speech applications such as text-to-speech synthesis, voice conversion etc. In order to reduce impact of pitch variations the harmonic analysis is done in the warped time scale that is aligned with instantaneous pitch values. It is assumed that each harmonic has its own periodic excitation source that evolves in time and can be modeled as a sum of several sinusoidal components with close frequencies. The parameters of the excitation components are estimated using a modified instantaneous Prony's method. The proposed analysis/synthesis technique is compared with TANDEM-STRAIGHT.


doi: 10.21437/Interspeech.2013-33

Cite as: Azarov, E., Vashkevich, M., Petrovsky, A. (2013) Instantaneous harmonic representation of speech using multicomponent sinusoidal excitation. Proc. Interspeech 2013, 1697-1701, doi: 10.21437/Interspeech.2013-33

@inproceedings{azarov13_interspeech,
  author={Elias Azarov and Maxim Vashkevich and Alexander Petrovsky},
  title={{Instantaneous harmonic representation of speech using multicomponent sinusoidal excitation}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={1697--1701},
  doi={10.21437/Interspeech.2013-33}
}