ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Layer Pruning on Demand with Intermediate CTC

Jaesong Lee, Jingu Kang, Shinji Watanabe

Deploying an end-to-end automatic speech recognition (ASR) model on mobile/embedded devices is a challenging task, since the device computational power and energy consumption requirements are dynamically changed in practice. To overcome the issue, we present a training and pruning method for ASR based on the connectionist temporal classification (CTC) which allows reduction of model depth at run-time without any extra fine-tuning. To achieve the goal, we adopt two regularization methods, intermediate CTC and stochastic depth, to train a model whose performance does not degrade much after pruning. We present an in-depth analysis of layer behaviors using singular vector canonical correlation analysis (SVCCA), and efficient strategies for finding layers which are safe to prune. Using the proposed method, we show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU, while each pruned sub-model maintains the accuracy of individually trained model of the same depth.

doi: 10.21437/Interspeech.2021-1171

Cite as: Lee, J., Kang, J., Watanabe, S. (2021) Layer Pruning on Demand with Intermediate CTC. Proc. Interspeech 2021, 3745-3749, doi: 10.21437/Interspeech.2021-1171

  author={Jaesong Lee and Jingu Kang and Shinji Watanabe},
  title={{Layer Pruning on Demand with Intermediate CTC}},
  booktitle={Proc. Interspeech 2021},