INTERSPEECH 2013

It has now been established that incorporating neural networks can be useful for speech recognition, and that machine learning methods can make it practical to incorporate a larger number of hidden layers in a "deep" structure. Here we incorporate the constraint of freezing the number of parameters for a given task, which in many applications corresponds to practical limitations on storage or computation. Given this constraint, we vary the size of each hidden layer as we change the number of layers so as to keep the total number of parameters constant. In this way we have determined, for a common task of noisy speech recognition (Aurora2), that a large number of layers is not always optimum; for each noise level there is an optimum number of layers. We also use stateoftheart optimization algorithms to further understand the effect of initialization and convergence properties of such networks, and to have an efficient implementation that allows us to run more experiments with a standard desktop machine with a single GPU.
Bibliographic reference. Vinyals, Oriol / Morgan, Nelson (2013): "Deep vs. wide: depth on a budget for robust speech recognition", In INTERSPEECH2013, 114118.