Phone Classification Using a Non-Linear Manifold with Broad Phone Class Dependent DNNs

Linxue Bai, Peter Jančovič, Martin Russell, Philip Weber, Steve Houghton


Most state-of-the-art automatic speech recognition (ASR) systems use a single deep neural network (DNN) to map the acoustic space to the decision space. However, different phonetic classes employ different production mechanisms and are best described by different types of features. Hence it may be advantageous to replace this single DNN with several phone class dependent DNNs. The appropriate mathematical formalism for this is a manifold. This paper assesses the use of a non-linear manifold structure with multiple DNNs for phone classification. The system has two levels. The first comprises a set of broad phone class (BPC) dependent DNN-based mappings and the second level is a fusion network. Various ways of designing and training the networks in both levels are assessed, including varying the size of hidden layers, the use of the bottleneck or softmax outputs as input to the fusion network, and the use of different broad class definitions. Phone classification experiments are performed on TIMIT. The results show that using the BPC-dependent DNNs provides small but significant improvements in phone classification accuracy relative to a single global DNN. The paper concludes with visualisations of the structures learned by the local and global DNNs and discussion of their interpretations.


 DOI: 10.21437/Interspeech.2017-1179

Cite as: Bai, L., Jančovič, P., Russell, M., Weber, P., Houghton, S. (2017) Phone Classification Using a Non-Linear Manifold with Broad Phone Class Dependent DNNs. Proc. Interspeech 2017, 319-323, DOI: 10.21437/Interspeech.2017-1179.


@inproceedings{Bai2017,
  author={Linxue Bai and Peter Jančovič and Martin Russell and Philip Weber and Steve Houghton},
  title={Phone Classification Using a Non-Linear Manifold with Broad Phone Class Dependent DNNs},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={319--323},
  doi={10.21437/Interspeech.2017-1179},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1179}
}