15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Restructuring Output Layers of Deep Neural Networks Using Minimum Risk Parameter Clustering

Yotaro Kubo (1), Jun Suzuki (2), Takaaki Hori (2), Atsushi Nakamura (3)

(1) Amazon, Germany
(2) NTT Corporation, Japan
(3) Nagoya City University, Japan

This paper attempts to optimize a topology of hidden Markov models (HMMs) for automatic speech recognition. Current state-of-the-art acoustic models for ASR involve HMMs with deep neural network (DNN)-based emission density functions. Even though DNN parameters are typically trained by optimizing a discriminative criterion, topology optimization of HMMs is usually performed by optimizing a generative criterion. Several approaches have been studied to achieve a discriminative state clustering, these approaches typically assume underlying Gaussian distributions of the acoustic features, and do not compatible with DNN-based emission density functions. In this paper, we attempt to derive a discriminative restructuring method of an HMM topology by introducing discriminative optimization with discrete constraints on the parameters, which force the parameters to be tied with the parameters of the other states. By applying this constrained optimization to the clustering of parameters of DNN-based acoustic models, we derived a discriminative HMM restructuring method that maintains discriminative performance of the original HMMs with the large number of states.

Full Paper

Bibliographic reference.  Kubo, Yotaro / Suzuki, Jun / Hori, Takaaki / Nakamura, Atsushi (2014): "Restructuring output layers of deep neural networks using minimum risk parameter clustering", In INTERSPEECH-2014, 1068-1072.