ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Scaling Effect of Self-Supervised Speech Models

Jie Pu, Yuguang Yang, Ruirui Li, Oguz Elibol, Jasha Droppo

The success of modern deep learning systems is built on two cornerstones, massive amount of annotated training data and advanced computational infrastructure to support large-scale computation. In recent years, the model size of state-of-the-art deep learning systems has rapidly increased and sometimes reached to billions of parameters. Herein we take a close look into this phenomenon and present an empirical study on the scaling effect of model size for self-supervised speech models. In particular, we investigate the quantitative relationship between the model size and the loss/accuracy performance on speech tasks. First, the power-law scaling property between the number of parameters and the L1 self-supervised loss is verified for speech models. Then the advantage of large speech models in learning effective speech representations is demonstrated in two downstream tasks: i) speaker recognition and ii) phoneme classification. Moreover, it has been shown that the model size of self-supervised speech networks is able to compensate the lack of annotation when there is insufficient training data.

doi: 10.21437/Interspeech.2021-1935

Cite as: Pu, J., Yang, Y., Li, R., Elibol, O., Droppo, J. (2021) Scaling Effect of Self-Supervised Speech Models. Proc. Interspeech 2021, 1084-1088, doi: 10.21437/Interspeech.2021-1935

  author={Jie Pu and Yuguang Yang and Ruirui Li and Oguz Elibol and Jasha Droppo},
  title={{Scaling Effect of Self-Supervised Speech Models}},
  booktitle={Proc. Interspeech 2021},