A Spectro-Temporal Demodulation Technique for Pitch Estimation

Jitendra Kumar Dhiman, Nagaraj Adiga, Chandra Sekhar Seelamantula


We consider a two-dimensional demodulation framework for spectro-temporal analysis of the speech signal. We construct narrowband (NB) speech spectrograms, and demodulate them using the Riesz transform, which is a two-dimensional extension of the Hilbert transform. The demodulation results in time-frequency envelope (amplitude modulation or AM) and time-frequency carrier (frequency modulation or FM). The AM corresponds to the vocal tract and is referred to as the vocal tract spectrogram. The FM corresponds to the underlying excitation and is referred to as the carrier spectrogram. The carrier spectrogram exhibits a high degree of time-frequency consistency for voiced sounds. For unvoiced sounds, such a structure is lacking. In addition, the carrier spectrogram reflects the fundamental frequency (F0) variation of the speech signal. We develop a technique to determine the F0 from the carrier spectrogram. The time-frequency consistency is used to determine which time-frequency regions correspond to voiced segments. Comparisons with the state-of-the-art F0 estimation algorithms show that the proposed F0 estimator has high accuracy for telephone channel speech and is robust to noise.


 DOI: 10.21437/Interspeech.2017-1138

Cite as: Dhiman, J.K., Adiga, N., Seelamantula, C.S. (2017) A Spectro-Temporal Demodulation Technique for Pitch Estimation. Proc. Interspeech 2017, 2306-2310, DOI: 10.21437/Interspeech.2017-1138.


@inproceedings{Dhiman2017,
  author={Jitendra Kumar Dhiman and Nagaraj Adiga and Chandra Sekhar Seelamantula},
  title={A Spectro-Temporal Demodulation Technique for Pitch Estimation},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2306--2310},
  doi={10.21437/Interspeech.2017-1138},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1138}
}