Speech and monophonic singing segmentation using pitch parameters

Xabier Sarasola, Eva Navas, David Tavarez, Luis Serrano, Ibon Saratxaga


In this paper we present a novel method for automatic segmentation of speech and monophonic singing voice based only on two parameters derived from pitch: proportion of voiced segments and percentage of pitch labelled as a musical note. First, voice is located in audio files using a GMM-HMM based VAD and pitch is calculated. Using the pitch curve, automatic musical note labelling is made applying stable value sequence search. Then pitch features extracted from each voice island are classified with Support Vector Machines. Our corpus consists in recordings of live sung poetry sessions where audio files contain both singing and speech voices. The proposed system has been compared with other speech/singing discrimination systems with good results.


 DOI: 10.21437/IberSPEECH.2018-31

Cite as: Sarasola, X., Navas, E., Tavarez, D., Serrano, L., Saratxaga, I. (2018) Speech and monophonic singing segmentation using pitch parameters. Proc. IberSPEECH 2018, 147-151, DOI: 10.21437/IberSPEECH.2018-31.


@inproceedings{Sarasola2018,
  author={Xabier Sarasola and Eva Navas and David Tavarez and Luis Serrano and Ibon Saratxaga},
  title={{Speech and monophonic singing segmentation using pitch parameters}},
  year=2018,
  booktitle={Proc. IberSPEECH 2018},
  pages={147--151},
  doi={10.21437/IberSPEECH.2018-31},
  url={http://dx.doi.org/10.21437/IberSPEECH.2018-31}
}