Decomposing speech signals into periodic and aperiodic components is an important task, finding applications in speech synthesis, coding, denoising, etc. In this paper, we construct a time-frequency coherence function to analyze spectro-temporal signatures of speech signals for distinguishing between deterministic and stochastic components of speech. The narrowband speech spectrogram is segmented into patches, which are represented as 2-D cosine carriers modulated in amplitude and frequency. Separation of carrier and amplitude/frequency modulations is achieved by 2-D demodulation using Riesz transform, which is the 2-D extension of Hilbert transform. The demodulated AM component reflects contributions of the vocal tract to spectrogram. The frequency modulated carrier (FM-carrier) signal exhibits properties of the excitation. The time-frequency coherence is defined with respect to FM-carrier and a coherence map is constructed, in which highly coherent regions represent nearly periodic and deterministic components of speech, whereas the incoherent regions correspond to unstructured components. The coherence map shows a clear distinction between deterministic and stochastic components in speech characterized by jitter, shimmer, lip radiation, type of excitation, etc. Binary masks prepared from the time-frequency coherence function are used for periodic-aperiodic decomposition of speech. Experimental results are presented to validate the efficiency of the proposed method.
Cite as: Vijayan, K., Dhiman, J.K., Seelamantula, C.S. (2017) Time-Frequency Coherence for Periodic-Aperiodic Decomposition of Speech Signals. Proc. Interspeech 2017, 329-333, doi: 10.21437/Interspeech.2017-726
@inproceedings{vijayan17_interspeech, author={Karthika Vijayan and Jitendra Kumar Dhiman and Chandra Sekhar Seelamantula}, title={{Time-Frequency Coherence for Periodic-Aperiodic Decomposition of Speech Signals}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={329--333}, doi={10.21437/Interspeech.2017-726} }