On the Correlation and Transferability of Features Between Automatic Speech Recognition and Speech Emotion Recognition

Haytham M. Fayek, Margaret Lech, Lawrence Cavedon


The correlation between Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER) is poorly understood. Studying such correlation may pave the way for integrating both tasks into a single system or may provide insights that can aid in advancing both systems such as improving ASR in dealing with emotional speech or embedding linguistic input into SER. In this paper, we quantify the relation between ASR and SER by studying the relevance of features learned between both tasks in deep convolutional neural networks using transfer learning. Experiments are conducted using the TIMIT and IEMOCAP databases. Results reveal an intriguing correlation between both tasks, where features learned in some layers particularly towards initial layers of the network for either task were found to be applicable to the other task with varying degree.


DOI: 10.21437/Interspeech.2016-868

Cite as

Fayek, H.M., Lech, M., Cavedon, L. (2016) On the Correlation and Transferability of Features Between Automatic Speech Recognition and Speech Emotion Recognition. Proc. Interspeech 2016, 3618-3622.

Bibtex
@inproceedings{Fayek+2016,
author={Haytham M. Fayek and Margaret Lech and Lawrence Cavedon},
title={On the Correlation and Transferability of Features Between Automatic Speech Recognition and Speech Emotion Recognition},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-868},
url={http://dx.doi.org/10.21437/Interspeech.2016-868},
pages={3618--3622}
}