Deep Learning Techniques for Koala Activity Detection

Ivan Himawan, Michael Towsey, Bradley Law, Paul Roe


Automatically detecting koalas in the real-life environment from audio recordings will immensely help ecologists, conservation groups and government departments interested in their preservation and the protection of their habitat. Inspired by the success of deep learning approaches in various audio classification tasks, in this paper, the feasibility of recognizing koalas' calls using a convolutional recurrent neural network architecture (CNN+RNN) is studied. The benefit of this architecture is twofold: firstly, convolutional layers learn local time-frequency patterns from the audio spectrogram and secondly, recurrent layers model longer temporal dependencies of the extracted features. In our datasets, the performance of CNN+RNN is evaluated and compared with standard convolutional neural networks (CNNs). The experimental results show that hybrid CNN+RNN architecture is beneficial for learning long-term patterns in spectrogram exhibited by koalas' calls in unseen conditions. The proposed method is also applicable for detecting other animal calls such as bird sound where it achieves 87.46% area under curve score on the bird audio detection challenge evaluation data.


 DOI: 10.21437/Interspeech.2018-1143

Cite as: Himawan, I., Towsey, M., Law, B., Roe, P. (2018) Deep Learning Techniques for Koala Activity Detection. Proc. Interspeech 2018, 2107-2111, DOI: 10.21437/Interspeech.2018-1143.


@inproceedings{Himawan2018,
  author={Ivan Himawan and Michael Towsey and Bradley Law and Paul Roe},
  title={Deep Learning Techniques for Koala Activity Detection},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2107--2111},
  doi={10.21437/Interspeech.2018-1143},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1143}
}