Recently, deep neural networks have been collecting attention of speech researchers due to its capability of handling nonlinearity in speech feature vectors. On the other hand, speech recognition based on structured classification is also considered important since it successfully exploits interdependency of several information sources. In this paper, we focus on the structured classification method based on weighted finite-state transducers (WFSTs) that introduces linear classification term for each arc transition cost in decoding network to capture contextural information of labels. Since these two approaches attempt to improve representation of features and labels, respectively, the combination of these models would be efficient because of complementarity. Thus, this paper proposes a method that combines deep neural network techniques with WFST-based structured classification approaches. In the proposed method, DNNs are used to extract classification friendly features; and then, the features are classified by using WFST-based structured classifiers. The proposed method is evaluated by using TIMIT continuous phoneme recognition tasks. We confirmed that combining structured classification leads to stable performance improvements even from the well-optimized deep neural network acoustic models.
Index Terms: Speech recognition, deep neural networks, structured classification, weighted finite-state transducers
Bibliographic reference. Kubo, Yotaro / Hori, Takaaki / Nakamura, Atsushi (2012): "Integrating deep neural networks into structural classification approach based on weighted finite-state transducers", In INTERSPEECH-2012, 2594-2597.