In this paper, we utilize deep neural networks (DNNs) to automatically identify native accents in English and Mandarin when no text, speaker or gender information is available for the speech data. Compared to the Gaussian mixture model (GMM) based conventional methods, the proposed method benefits from two main advantages: first, DNNs are discriminative models which can provide better discrimination on confusion regions of different accents; second, they have the hierarchical nonlinear feature extraction capability which can learn discriminative high-level features for the specified task. In detail, the speech data of all accents is used to train DNNs, and in the testing stage, we first identify the accent label of each frame, then determine the sentence label by the majority voting conducted on the frame labels. The experiments on accented English and Mandarin corpus demonstrate that, compared to the GMM based methods, our proposed method can significantly improve the frame accuracy as well as sentence accuracy on the test set. Moreover, the performance of the proposed method can be further improved by using context information.
Bibliographic reference. Chen, Mingming / Yang, Zhanlei / Zheng, Hao / Liu, Wenju (2014): "Improving native accent identification using deep neural networks", In INTERSPEECH-2014, 2170-2174.