Using Attention Networks and Adversarial Augmentation for Styrian Dialect Continuous Sleepiness and Baby Sound Recognition

Sung-Lin Yeh, Gao-Yi Chao, Bo-Hao Su, Yu-Lin Huang, Meng-Han Lin, Yin-Chun Tsai, Yu-Wen Tai, Zheng-Chi Lu, Chieh-Yu Chen, Tsung-Ming Tai, Chiu-Wang Tseng, Cheng-Kuang Lee, Chi-Chun Lee


In this study, we present extensive attention-based networks with data augmentation methods to participate in the INTERSPEECH 2019 ComPareE Challenge, specifically the three Sub-challenges: Styrian Dialect Recognition, Continuous Sleepiness Regression, and Baby Sound Classification. For Styrian Dialect Sub-challenge, these dialects are classified into Northern Styrian (NorthernS), Urban Sytrian (UrbanS), and Eastern Styrian (EasternS). Our proposed model achieves an UAR 49.5% on the test set, which is 2.5% higher than the baseline. For Continuous Sleepiness Sub-challenge, it is defined as a regression task with score range from 1 (extremely alert) to 9 (very sleepy). In this work, our proposed architecture achieves a Spearman correlation 0.369 on the test set, which surpasses the baseline model by 0.026. For Baby Sound Sub-challenge, the infant sounds are classified into canonical babbling, non-canonical babbling, crying, laughing and junk/other, and our proposed augmentation framework achieves an UAR of 62.39% on the test set, which outperforms the baseline by about 3.7%. Overall, our analyses demonstrate that by fusing attention network models with conventional support vector machine benefits the test set robustness, and the recognition rates of these paralinguistic attributes generally improve when performing data augmentation.


 DOI: 10.21437/Interspeech.2019-2110

Cite as: Yeh, S., Chao, G., Su, B., Huang, Y., Lin, M., Tsai, Y., Tai, Y., Lu, Z., Chen, C., Tai, T., Tseng, C., Lee, C., Lee, C. (2019) Using Attention Networks and Adversarial Augmentation for Styrian Dialect Continuous Sleepiness and Baby Sound Recognition. Proc. Interspeech 2019, 2398-2402, DOI: 10.21437/Interspeech.2019-2110.


@inproceedings{Yeh2019,
  author={Sung-Lin Yeh and Gao-Yi Chao and Bo-Hao Su and Yu-Lin Huang and Meng-Han Lin and Yin-Chun Tsai and Yu-Wen Tai and Zheng-Chi Lu and Chieh-Yu Chen and Tsung-Ming Tai and Chiu-Wang Tseng and Cheng-Kuang Lee and Chi-Chun Lee},
  title={{Using Attention Networks and Adversarial Augmentation for Styrian Dialect Continuous Sleepiness and Baby Sound Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2398--2402},
  doi={10.21437/Interspeech.2019-2110},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2110}
}