ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Discovering Language in Marmoset Vocalization

Sakshi Verma, K.L. Prateek, Karthik Pandia, Nauman Dawalatabad, Rogier Landman, Jitendra Sharma, Mriganka Sur, Hema A. Murthy

Various studies suggest that marmosets ( Callithrix jacchus) show behavior similar to that of humans in many aspects. Analyzing their calls would not only enable us to better understand these species but would also give insights into the evolution of human languages and vocal tract. This paper describes a technique to discover the patterns in marmoset vocalization in an unsupervised fashion. The proposed unsupervised clustering approach operates in two stages. Initially, voice activity detection (VAD) is applied to remove silences and non-voiced regions from the audio. This is followed by a group-delay based segmentation on the voiced regions to obtain smaller segments. In the second stage, a two-tier clustering is performed on the segments obtained. Individual hidden Markov models (HMMs) are built for each of the segments using a multiple frame size and multiple frame rate. The HMMs are then clustered until each cluster is made up of a large number of segments. Once all the clusters get enough number of segments, one Gaussian mixture model (GMM) is built for each of the clusters. These clusters are then merged using Kullback-Leibler (KL) divergence. The algorithm converges to the total number of distinct sounds in the audio, as evidenced by listening tests.

doi: 10.21437/Interspeech.2017-842

Cite as: Verma, S., Prateek, K.L., Pandia, K., Dawalatabad, N., Landman, R., Sharma, J., Sur, M., Murthy, H.A. (2017) Discovering Language in Marmoset Vocalization. Proc. Interspeech 2017, 2426-2430, doi: 10.21437/Interspeech.2017-842

  author={Sakshi Verma and K.L. Prateek and Karthik Pandia and Nauman Dawalatabad and Rogier Landman and Jitendra Sharma and Mriganka Sur and Hema A. Murthy},
  title={{Discovering Language in Marmoset Vocalization}},
  booktitle={Proc. Interspeech 2017},