Novel Front-End Features Based on Neural Graph Embeddings for DNN-HMM and LSTM-CTC Acoustic Modeling

Yuzong Liu, Katrin Kirchhoff


In this paper we investigate neural graph embeddings as front-end features for various deep neural network (DNN) architectures for speech recognition. Neural graph embedding features are produced by an autoencoder that maps graph structures defined over speech samples to a continuous vector space. The resulting feature representation is then used to augment the standard acoustic features at the input level of a DNN classifier. We compare two different neural graph embedding methods, one based on a local neighborhood graph encoding, and another based on a global similarity graph encoding. They are evaluated in DNN-HMM-based and LSTM-CTC-based ASR systems on a 110-hour Switchboard conversational speech recognition task. Significant improvements in word error rates are achieved by both methods in the DNN-HMM system, and by global graph embeddings in the LSTM-CTC system.


DOI: 10.21437/Interspeech.2016-542

Cite as

Liu, Y., Kirchhoff, K. (2016) Novel Front-End Features Based on Neural Graph Embeddings for DNN-HMM and LSTM-CTC Acoustic Modeling. Proc. Interspeech 2016, 793-797.

Bibtex
@inproceedings{Liu+2016,
author={Yuzong Liu and Katrin Kirchhoff},
title={Novel Front-End Features Based on Neural Graph Embeddings for DNN-HMM and LSTM-CTC Acoustic Modeling},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-542},
url={http://dx.doi.org/10.21437/Interspeech.2016-542},
pages={793--797}
}