ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Shallow Convolution-Augmented Transformer with Differentiable Neural Computer for Low-Complexity Classification of Variable-Length Acoustic Scene

Soonshin Seo, Donghyun Lee, Ji-Hwan Kim

Convolutional neural networks (CNNs) exhibit good performance in low-complexity classification with fixed-length acoustic scenes. However, previous studies have not considered variable-length acoustic scenes in which performance degradation is prevalent. In this regard, we investigate two novel architectures — convolution-augmented transformer (Conformer) and differentiable neural computer (DNC). Both the models show desirable performance for variable-length data but require a large amount of data. In other words, small amounts of data, such as those from acoustic scenes, lead to overfitting in these models. In this paper, we propose a shallow convolution-augmented Transformer with a differentiable neural computer (shallow Conformer-DNC) for the low-complexity classification of variable-length acoustic scenes. The shallow Conformer-DNC is enabled to converge with small amounts of data. Short-term and long-term contexts of variable-length acoustic scenes are trained by using the shallow Conformer and shallow DNC, respectively. The experiments were conducted for variable-length conditions using the TAU Urban Acoustic Scenes 2020 Mobile dataset. As a result, a peak accuracy of 61.25% was confirmed for shallow Conformer-DNC with a model parameter of 34 K. It is comparable performance to state-of-the-art CNNs.


doi: 10.21437/Interspeech.2021-1308

Cite as: Seo, S., Lee, D., Kim, J.-H. (2021) Shallow Convolution-Augmented Transformer with Differentiable Neural Computer for Low-Complexity Classification of Variable-Length Acoustic Scene. Proc. Interspeech 2021, 576-580, doi: 10.21437/Interspeech.2021-1308

@inproceedings{seo21_interspeech,
  author={Soonshin Seo and Donghyun Lee and Ji-Hwan Kim},
  title={{Shallow Convolution-Augmented Transformer with Differentiable Neural Computer for Low-Complexity Classification of Variable-Length Acoustic Scene}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={576--580},
  doi={10.21437/Interspeech.2021-1308}
}