10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

AM-FM Estimation for Speech Based on a Time-Varying Sinusoidal Model

Yannis Pantazis (1), Olivier Rosec (2), Yannis Stylianou (1)

(1) FORTH, Greece
(2) Orange Labs, France

In this paper we present a method based on a time-varying sinusoidal model for a robust and accurate estimation of amplitude and frequency modulations (AM-FM) in speech. The suggested approach has two main steps. First, speech is modeled as a sinusoidal model with time-varying amplitudes. Specifically, the model makes use of a first order time polynomial with complex coefficients for capturing instantaneous amplitude and frequency (phase) components. Next, the model parameters are updated by using the previously estimated instantaneous phase information. Thus, an iterative scheme for AM-FM decomposition of speech is suggested which was validated on synthetic AM-FM signals and tested on reconstruction of voiced speech signals where the signal-to-error reconstruction ratio (SERR) was used as measure. Compared to the standard sinusoidal representation, the suggested approach found to improve the corresponding SERR by 47%, resulting in over 30 dB of SERR.

Full Paper

Bibliographic reference.  Pantazis, Yannis / Rosec, Olivier / Stylianou, Yannis (2009): "AM-FM estimation for speech based on a time-varying sinusoidal model", In INTERSPEECH-2009, 104-107.