ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Applying TDNN Architectures for Analyzing Duration Dependencies on Speech Emotion Recognition

Pooja Kumawat, Aurobinda Routray

We have analyzed the Time Delay Neural Network (TDNN) based architectures for speech emotion classification. TDNN models efficiently capture the temporal information and provide an utterance level prediction. Emotions are dynamic in nature and require temporal context for reliable prediction. In our work, we have applied the TDNN based x-vector and emphasized channel attention, propagation & aggregation based TDNN (ECAPA-TDNN) architectures for speech emotion identification with RAVDESS, Emo-DB, and IEMOCAP databases. The results show that the TDNN architectures are very efficient for predicting emotion classes and ECAPA-TDNN outperforms the TDNN based x-vector architecture. Next, we investigated the performance of ECAPA-TDNN with various training chunk durations and test utterance durations. We have identified that in spite of very promising emotion recognition performance the TDNN models have a strong training chunk duration-based bias. Earlier research work revealed that individual emotion class accuracy depends largely on the test utterance duration. Most of these studies were based on frame level emotions predictions. However, utterance level based emotion recognition is relatively less explored. The results show that even with the TDNN models, the accuracy of the different emotion classes is dependent on the utterance duration.


doi: 10.21437/Interspeech.2021-2168

Cite as: Kumawat, P., Routray, A. (2021) Applying TDNN Architectures for Analyzing Duration Dependencies on Speech Emotion Recognition. Proc. Interspeech 2021, 3410-3414, doi: 10.21437/Interspeech.2021-2168

@inproceedings{kumawat21_interspeech,
  author={Pooja Kumawat and Aurobinda Routray},
  title={{Applying TDNN Architectures for Analyzing Duration Dependencies on Speech Emotion Recognition}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={3410--3414},
  doi={10.21437/Interspeech.2021-2168}
}