Speech can be segmented into syllables by identifying the syllable nuclei, which are points of high sonority. The excitation peaks in the linear prediction (LP) residual and the formant peaks can be interpreted as perceptually significant point features which contribute to the loudness of speech. In this paper, the use of these two point features is described for the use of detecting syllable nuclei. Each of these evidences contain information about different aspects of speech production, namely the glottal vibrations and the time varying vocal tract system. Thus it is possible that they contain complementary information about the syllable nuclei. Performance of the proposed syllable nuclei detection algorithm is evaluated for the TIMIT, Switchboard and the NTIMIT corpus. The proposed method performs comparably against two other state of the art syllable nuclei detection methods, and is shown to perform better for conversational speech. It is very fast and requires no training.
Bibliographic reference. Arrabothu, Apoorv Reddy / Chennupati, Nivedita / Yegnanarayana, B. (2013): "Syllable nuclei detection using perceptually significant features", In INTERSPEECH-2013, 963-967.