12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Deep Convex Net: A Scalable Architecture for Speech Pattern Classification

Li Deng, Dong Yu

Microsoft Research, USA

We recently developed context-dependent DNN-HMM (Deep- Neural-Net/Hidden-Markov-Model) for large-vocabulary speech recognition. While achieving impressive recognition error rate reduction, we face the insurmountable problem of scalability in dealing with virtually unlimited amount of training data available nowadays. To overcome the scalability challenge, we have designed the deep convex network (DCN) architecture. The learning problem in DCN is convex within each module. Additional structure-exploited fine tuning further improves the quality of DCN. The full learning in DCN is batch-mode based instead of stochastic, naturally lending it amenable to parallel training that can be distributed over many machines. Experimental results on both MNIST and TIMIT tasks evaluated thus far demonstrate superior performance of DCN over the DBN (Deep Belief Network) counterpart that forms the basis of the DNN. The superiority is reflected not only in training scalability and CPU-only computation, but more importantly in classification accuracy in both tasks.

Full Paper

Bibliographic reference.  Deng, Li / Yu, Dong (2011): "Deep convex net: a scalable architecture for speech pattern classification", In INTERSPEECH-2011, 2285-2288.