A Framework for Automated Marmoset Vocalization Detection and Classification

Alan Wisler, Laura J. Brattain, Rogier Landman, Thomas F. Quatieri

This paper describes a novel framework for automated marmoset vocalization detection and classification from within long audio streams recorded in a noisy animal room, where multiple marmosets are housed. To overcome the challenge of limited manually annotated data, we implemented a data augmentation method using only a small number of labeled vocalizations. The feature sets chosen have the desirable property of capturing characteristics of the signals that are useful in both identifying and distinguishing marmoset vocalizations. Unlike many previous methods, feature extraction, call detection, and call classification in our system are completely automated. The system maintains a good performance of 20% equal error detection rate using data with high number of noise events and 15% of classification error. Performance can be further improved with additional labeled training data. Because this extensible system is capable of identifying both positive and negative welfare indicators, it provides a powerful framework for non-human primate welfare monitoring as well as behavior assessment.

DOI: 10.21437/Interspeech.2016-1410

Cite as

Wisler, A., Brattain, L.J., Landman, R., Quatieri, T.F. (2016) A Framework for Automated Marmoset Vocalization Detection and Classification. Proc. Interspeech 2016, 2592-2596.

author={Alan Wisler and Laura J. Brattain and Rogier Landman and Thomas F. Quatieri},
title={A Framework for Automated Marmoset Vocalization Detection and Classification},
booktitle={Interspeech 2016},