A main advantage of the deep neural network (DNN) model lies on the fact that
no artificial assumptions are placed on the data distribution and model structure,
which offers the possibility to learn very flexible models. This flexibility,
however, may lead to highly redundant parameters, hence demanding computation
and risk of over-fitting. Network pruning cuts off unimportant connections,
and therefore can be used to produce parsimonious and well generalizable models.
This paper proposes to utilize optimal brain damage (OBD) to conduct DNN pruning. OBD computes connection salience based on Hessians, and thus is sound in theory and reliable in practice. We present our implementation of OBD for DNNs, and demonstrate that the OBD pruning can produce very sparse DNNs while retaining the discriminative power of the original network to a large extent. By comparing with a simple magnitude-based pruning, we find that for weak pruned networks, pruning methods are unimportant since retraining can largely recover the function loss caused by pruning; while for highly pruned networks, sophisticated pruning methods (such as OBD) are clearly superior.
Bibliographic reference. Liu, Chao / Zhang, Zhiyong / Wang, Dong (2014): "Pruning deep neural networks by optimal brain damage", In INTERSPEECH-2014, 1092-1095.