ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Dynamic Multi-Scale Convolution for Dialect Identification

Tianlong Kong, Shouyi Yin, Dawei Zhang, Wang Geng, Xin Wang, Dandan Song, Jinwen Huang, Huiyu Shi, Xiaorui Wang

Time Delay Neural Networks (TDNN)-based methods are widely used in dialect identification. However, in previous work with TDNN application, subtle variant is being neglected in different feature scales. To address this issue, we propose a new architecture, named dynamic multi-scale convolution, which consists of dynamic kernel convolution, local multi-scale learning, and global multi-scale pooling. Dynamic kernel convolution captures features between short-term and long-term context adaptively. Local multi-scale learning, which represents multi-scale features at a granular level, is able to increase the range of receptive fields for convolution operation. Besides, global multi-scale pooling is applied to aggregate features from different bottleneck layers in order to collect information from multiple aspects. The proposed architecture significantly outperforms state-of-the-art system on the AP20-OLR-dialect-task of oriental language recognition (OLR) challenge 2020, with the best average cost performance (Cavg) of 0.067 and the best equal error rate (EER) of 6.52%. Compared with the known best results, our method achieves 9% of Cavg and 45% of EER relative improvement, respectively. Furthermore, the parameters of proposed model are 91% fewer than the best known model.

doi: 10.21437/Interspeech.2021-56

Cite as: Kong, T., Yin, S., Zhang, D., Geng, W., Wang, X., Song, D., Huang, J., Shi, H., Wang, X. (2021) Dynamic Multi-Scale Convolution for Dialect Identification. Proc. Interspeech 2021, 3261-3265, doi: 10.21437/Interspeech.2021-56

  author={Tianlong Kong and Shouyi Yin and Dawei Zhang and Wang Geng and Xin Wang and Dandan Song and Jinwen Huang and Huiyu Shi and Xiaorui Wang},
  title={{Dynamic Multi-Scale Convolution for Dialect Identification}},
  booktitle={Proc. Interspeech 2021},