Deep Stacked Autoencoders for Spoken Language Understanding

Killian Janod, Mohamed Morchid, Richard Dufour, Georges Linarès, Renato De Mori


The automatic transcription process of spoken document results in several word errors, especially when very noisy conditions are encountered. Document representations based on neural embedding frameworks have recently shown significant improvements in different Spoken and Natural Language Understanding tasks such as denoising and filtering. Nonetheless, these methods mainly need clean representations, failing to properly remove noise contained in noisy representations. This paper proposes to study the impact of residual noise contained into automatic transcripts of spoken dialogues in highly abstract spaces from deep neural networks. The paper makes the assumption that the noise learned from “clean” manual transcripts of spoken documents moves down dramatically the performance of theme identification systems in noisy conditions. The proposed deep neural network takes, as input and output, highly imperfect transcripts from spoken dialogues to improve the robustness of the document representation in a noisy environment. Results obtained on the DECODA theme classification task of dialogues reach an accuracy of 82% with a significant gain of about 5%.


DOI: 10.21437/Interspeech.2016-63

Cite as

Janod, K., Morchid, M., Dufour, R., Linarès, G., Mori, R.D. (2016) Deep Stacked Autoencoders for Spoken Language Understanding. Proc. Interspeech 2016, 720-724.

Bibtex
@inproceedings{Janod+2016,
author={Killian Janod and Mohamed Morchid and Richard Dufour and Georges Linarès and Renato De Mori},
title={Deep Stacked Autoencoders for Spoken Language Understanding},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-63},
url={http://dx.doi.org/10.21437/Interspeech.2016-63},
pages={720--724}
}