Automatic segmentation of speech signals has been a constant engineering challenge. Even after the advances with supervised and unsupervised techniques, there still lies a challenge to equal the manually labelled segments. HMM-based segmentation techniques with modifications and corrections have been the state-of-art. These techniques are supervised in nature and thus require availability of large corpus transcribed with phone boundaries. The unsupervised techniques, on the other hand, explore gradients in various spectral and temporal properties of the speech signals. This paper presents a new and unsupervised method based on signal processing techniques to segment the speech signals. A recently developed method known as Zero Time Liftering (ZTL) is used for the analysis of speech signals, which provides fine temporal resolution of the spectral features of the segment being analyzed. It uses the Hilbert envelope of Numerator Group Delay (HNGD) of the signal to highlight its spectral activity. This representation is used to extract high SNR regions of the spectra, which in turn proves to be useful in representation of the production characteristics of the speech signal. Performance of the proposed analysis is at par with the existing baseline systems for unsupervised segmentation.
Bibliographic reference. Prasad, RaviShankar / Yegnanarayana, B. (2013): "Acoustic segmentation of speech using zero time liftering (ZTL)", In INTERSPEECH-2013, 2292-2296.