Convolutional neural networks have recently been shown to outperform fully connected deep neural networks on several speech recognition tasks. Their superior performance is due to their convolutional structure that processes several, slightly shifted versions of the input window using the same weights, and then pools the resulting neural activations. This pooling operation makes the network less sensitive to translations. The convolutional network results published up till now used sigmoid or rectified linear neurons. However, quite recently a new type of activation function called the maxout activation has been proposed. Its operation is closely related to convolutional networks, as it applies a similar pooling step, but over different neurons evaluated on the same input. Here, we combine the two technologies, and experiment with deep convolutional neural networks built from maxout neurons. Phone recognition tests on the TIMIT database show that switching to maxout units from rectifier units decreases the phone error rate for each network configuration studied, and yields relative error rate reductions of between 2% and 6%.
Bibliographic reference. Tóth, László (2014): "Convolutional deep maxout networks for phone recognition", In INTERSPEECH-2014, 1078-1082.