14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Subspace Models for Bottleneck Features

Jun Qi (1), Dong Wang (1), Javier Tejedor (2)

(1) Tsinghua University, China
(2) Universidad Autónoma de Madrid, Spain

The bottleneck (BN) feature, particularly based on deep structures, has gained significant success in automatic speech recognition (ASR). However, applying the BN feature to small/medium-scale tasks is nontrivial. An obvious reason is that the limited training data prevent from training a complicated deep network; another reason, which is more subtle, is that the BN feature tends to possess high inter-dimensional correlation, thus being inappropriate to be modeled by the conventional diagonal Gaussian mixture model (GMM). This difficulty can be mitigated by increasing the number of Gaussian components and/or employing full covariance matrices. These approaches, however, are not applicable for small/mediumscale tasks for which only a limited amount of training data is available. In this paper, we study the subspace Gaussian mixture model (SGMM) for BN features. The SGMM assumes full but shared covariance matrices, and hence can address the inter-dimensional correlation in a parsimonious way. This is particularly attractive for the BN feature, especially on small/medium-scale tasks, where the inter-dimensional correlation is high but the full covariance modeling is not affordable due to the limited training data. Our preliminary experiments on the Resource Management (RM) database demonstrate that the SGMM can deliver significant performance improvement for ASR systems based on BN features.

Full Paper

Bibliographic reference.  Qi, Jun / Wang, Dong / Tejedor, Javier (2013): "Subspace models for bottleneck features", In INTERSPEECH-2013, 1746-1750.