Towards Detection of Canonical Babbling by Citizen Scientists: Performance as a Function of Clip Length

Amanda Seidl, Anne S. Warlaumont, Alejandrina Cristia


Theoretical, empirical, and intervention research requires access to a large, unbiased, annotated dataset of infant vocalizations for training speech technology to detect and differentiate consonant-vowel (canonical) syllables in infants’ vocalizations from less mature vocalizations. Citizen scientists could help us to achieve the goal of this dataset, if classification is accurate regardless of coders’ native language and training and can be completed on clips short enough to avoid revealing personal identifying information. Three groups of coders participated in an experiment: trained native, semi-trained native, and minimally-trained foreign. When vocalizations were presented whole, reliability was highest across the trained coders, with little difference between the semi-trained and minimally-trained coders. Among minimally-trained coders, reliability for 400ms-long clips was very similar to that found for full clips, with lower values for 200 and 600ms clips. Finally, error rates were minimized when 400ms-long clips were used. In sum, minimally-trained coders can achieve fairly reliable and accurate results, even when their native language does not match infants’ target language and when provided with very short clips. Since shorter clips protect the identity of the child and her family, this manner of data annotation may provide us with a way of building a large, unbiased dataset of infant vocalizations.


 DOI: 10.21437/Interspeech.2019-1773

Cite as: Seidl, A., Warlaumont, A.S., Cristia, A. (2019) Towards Detection of Canonical Babbling by Citizen Scientists: Performance as a Function of Clip Length. Proc. Interspeech 2019, 3579-3583, DOI: 10.21437/Interspeech.2019-1773.


@inproceedings{Seidl2019,
  author={Amanda Seidl and Anne S. Warlaumont and Alejandrina Cristia},
  title={{Towards Detection of Canonical Babbling by Citizen Scientists: Performance as a Function of Clip Length}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3579--3583},
  doi={10.21437/Interspeech.2019-1773},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1773}
}