Multicomponent 2-D AM-FM Modeling of Speech Spectrograms

Jitendra Kumar Dhiman, Neeraj Sharma, Chandra Sekhar Seelamantula


In contrast to 1-D short-time analysis of speech, 2-D modeling of spectrograms provides a characterization of speech attributes directly in the joint time-frequency plane. Building on existing 2-D models to analyze a spectrogram patch, we propose a multicomponent 2-D AM-FM representation for spectrogram decomposition. The components of the proposed representation comprise a DC, a fundamental frequency carrier and its harmonics and a spectrotemporal envelope, all in 2-D. The number of harmonics required is patch-dependent. The estimation of the AM and FM is done using the Riesz transform and the component weights are estimated using a least-squares approach. The proposed representation provides an improvement over existing state-of-the-art approaches, for both male and female speakers. This is quantified using reconstruction SNR and perceptual evaluation of speech quality (PESQ) metric. Further, we perform an overlap-add on the DC component, pooling all the patches and obtain a time-frequency (t-f) aperiodicity map for the speech signal. We verify its effectiveness in improving speech synthesis quality by using it in an existing state-of-the-art vocoder.


 DOI: 10.21437/Interspeech.2018-1937

Cite as: Dhiman, J.K., Sharma, N., Seelamantula, C.S. (2018) Multicomponent 2-D AM-FM Modeling of Speech Spectrograms. Proc. Interspeech 2018, 736-740, DOI: 10.21437/Interspeech.2018-1937.


@inproceedings{Dhiman2018,
  author={Jitendra Kumar Dhiman and Neeraj Sharma and Chandra Sekhar Seelamantula},
  title={Multicomponent 2-D AM-FM Modeling of Speech Spectrograms},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={736--740},
  doi={10.21437/Interspeech.2018-1937},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1937}
}