Weakly Supervised Syllable Segmentation by Vowel-Consonant Peak Classification

Ravi Shankar, Archana Venkataraman


We present a novel approach for blind syllable segmentation that combines model-based feature selection with data-driven classification. In particular, we learn a function that maps short-term energy peaks of a speech utterance onto either the vowel or consonant class. The features used for classification capture spectral and energy signatures which are characteristic of the phonetic properties of the English language. The identified vowel peaks subsequently act as the nucleus of our syllable segments. We demonstrate the effectiveness of our proposed method using nested cross validation on 400 unique test utterances taken randomly from the TIMIT dataset containing over 5000 syllables in total. Our hybrid approach achieves lower insertion rate than the state-of-the-art segmentation methods and a lower deletion rate than all the baseline comparisons.


 DOI: 10.21437/Interspeech.2019-1450

Cite as: Shankar, R., Venkataraman, A. (2019) Weakly Supervised Syllable Segmentation by Vowel-Consonant Peak Classification. Proc. Interspeech 2019, 644-648, DOI: 10.21437/Interspeech.2019-1450.


@inproceedings{Shankar2019,
  author={Ravi Shankar and Archana Venkataraman},
  title={{Weakly Supervised Syllable Segmentation by Vowel-Consonant Peak Classification}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={644--648},
  doi={10.21437/Interspeech.2019-1450},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1450}
}