Segmentation of speech into syllables is beneficial for many spoken language processing applications since it provides information about phonological and rhythmic aspects of speech. Traditional methods usually detect syllable nuclei using features such as energies in critical bands, linear predictive coding spectra, pitch, voicing, etc. Here, a novel system that uses auditory attention cues is proposed for predicting syllable boundaries. The auditory attention cues are biologically inspired and capture changes in sound characteristic by using 2D spectro-temporal receptive filters. When tested on TIMIT, it is shown that the proposed method successfully predicts syllable boundaries and performs as good as or better than the state-of-the art syllable nucleus detection methods.
Bibliographic reference. Kalinli, Ozlem (2011): "Syllable segmentation of continuous speech using auditory attention cues", In INTERSPEECH-2011, 425-428.