Stochastic Shake-Shake Regularization for Affective Learning from Speech

Che-Wei Huang, Shrikanth Narayanan


We propose stochastic Shake-Shake regularization based on multi-branch residual architectures to mitigate over-fitting in affective learning from speech. Inspired by recent Shake-Shake [1] and ShakeDrop [2] regularization techniques, we introduce negative scaling into the Shake-Shake regularization algorithm while still maintain a consistent stochastic convex combination of branches to encourage diversity among branches whether they are scaled by positive or negative coefficients. In addition, we also employ the idea of stochastic depth to randomly relax the shaking mechanism during training as a method to control the strength of regularization. Through experiments on speech emotion recognition with various levels of regularization strength, we discover that the shaking mechanism alone contributes much more to constraining the optimization of network parameters than to boosting the generalization power. However, stochastically relaxing the shaking regularization serves to conveniently strike a balance between them. With a flexible configuration of hybrid layers, promising experimental results demonstrate a higher unweighted accuracy and a smaller gap between training and validation, i.e. reduced over-fitting and shed light on the future direction for pattern recognition tasks with low resource.


 DOI: 10.21437/Interspeech.2018-1327

Cite as: Huang, C., Narayanan, S. (2018) Stochastic Shake-Shake Regularization for Affective Learning from Speech. Proc. Interspeech 2018, 3658-3662, DOI: 10.21437/Interspeech.2018-1327.


@inproceedings{Huang2018,
  author={Che-Wei Huang and Shrikanth Narayanan},
  title={Stochastic Shake-Shake Regularization for Affective Learning from Speech},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3658--3662},
  doi={10.21437/Interspeech.2018-1327},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1327}
}