Acoustic cues of prosodic stability in the perceptual distinction between speech and singing

João Cabral


In this work, we try to understand how spoken and sung versions of the same text differ in terms of the variability in duration and pitch. People can perceptually distinguish speech from singing, especially if the audio is sufficiently long. However, we assume that there is both an intersection in the realisation of the two modalities, which makes more difficult to differentiate them sometimes, and clear differences between them other times. This raises the questions: Are speaking and singing completely different phenomena? Can we measure acoustic properties of these signals that demonstrate their differences or eventually their similarities? We have conducted different types of experiments in this project. One is based on stability measures of fundamental frequency (F0) and speech rate on spoken and sung versions of the same text (from Brazilian Portuguese popular songs). Another one is a perceptual study of the differences between the two types of recordings. Initial results were unexpected, especially the F0 stability which was not always significantly different between the two. However, later results were more supportive of the hypothesis that we can differentiate speech and singing in terms of acoustic properties, particularly, results that can supplant the initial one related to F0 stability.


Cite as: Cabral, J. (2018) Acoustic cues of prosodic stability in the perceptual distinction between speech and singing. Proc. Workshop on Speech, Music and Mind 2018.


@inproceedings{Cabral2018,
  author={João Cabral},
  title={Acoustic cues of prosodic stability in the perceptual distinction between speech and singing},
  year=2018,
  booktitle={Proc. Workshop on Speech, Music and Mind 2018}
}