Semi-Supervised Audio Classification with Consistency-Based Regularization

Kangkang Lu, Chuan-Sheng Foo, Kah Kuan Teh, Huy Dat Tran, Vijay Ramaseshan Chandrasekhar

Consistency-based semi-supervised learning methods such as the Mean Teacher method are state-of-the-art on image datasets, but have yet to be applied to audio data. Such methods encourage model predictions to be consistent on perturbed input data. In this paper, we incorporate audio-specific perturbations into the Mean Teacher algorithm and demonstrate the effectiveness of the resulting method on audio classification tasks. Specifically, we perturb audio inputs by mixing in other environmental audio clips, and leverage other training examples as sources of noise. Experiments on the Google Speech Command Dataset and UrbanSound8K Dataset show that the method can achieve comparable performance to a purely supervised approach while using only a fraction of the labels.

 DOI: 10.21437/Interspeech.2019-1231

Cite as: Lu, K., Foo, C., Teh, K.K., Tran, H.D., Chandrasekhar, V.R. (2019) Semi-Supervised Audio Classification with Consistency-Based Regularization. Proc. Interspeech 2019, 3654-3658, DOI: 10.21437/Interspeech.2019-1231.

  author={Kangkang Lu and Chuan-Sheng Foo and Kah Kuan Teh and Huy Dat Tran and Vijay Ramaseshan Chandrasekhar},
  title={{Semi-Supervised Audio Classification with Consistency-Based Regularization}},
  booktitle={Proc. Interspeech 2019},