Convolutional Neural Networks with Data Augmentation for Classifying Speakers’ Native Language

Gil Keren, Jun Deng, Jouni Pohjalainen, Björn Schuller


We use a feedforward Convolutional Neural Network to classify speakers’ native language for the INTERSPEECH 2016 Computational Paralinguistic Challenge Native Language Sub-Challenge, using no specialized features for computational paralinguistics tasks, but only MFCCs with their first and second order deltas. In addition, we augment the training data by replacing the original examples with shorter overlapping samples extracted from them, thus multiplying the number of training examples by almost 40. With the augmented training dataset and enhancements to neural network models such as Batch Normalization, Dropout, and Maxout activation function, we managed to improve upon the challenge baseline by a large margin, both for the development and the test set.


DOI: 10.21437/Interspeech.2016-261

Cite as

Keren, G., Deng, J., Pohjalainen, J., Schuller, B. (2016) Convolutional Neural Networks with Data Augmentation for Classifying Speakers’ Native Language. Proc. Interspeech 2016, 2393-2397.

Bibtex
@inproceedings{Keren+2016,
author={Gil Keren and Jun Deng and Jouni Pohjalainen and Björn Schuller},
title={Convolutional Neural Networks with Data Augmentation for Classifying Speakers’ Native Language},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-261},
url={http://dx.doi.org/10.21437/Interspeech.2016-261},
pages={2393--2397}
}