Deep Learning Based Mandarin Accent Identification for Accent Robust ASR

Felix Weninger, Yang Sun, Junho Park, Daniel Willett, Puming Zhan


In this paper, we present an in-depth study on the classification of regional accents in Mandarin speech. Experiments are carried out on Mandarin speech data systematically collected from 15 different geographical regions in China for broad coverage. We explore bidirectional Long Short-Term Memory (bLSTM) networks and i-vectors to model longer-term acoustic context. Starting from the classification of the collected data into the 15 regional accents, we derive a three-class grouping via non-metric dimensional scaling (NMDS), for which 68.4% average recall can be obtained. Furthermore, we evaluate a state-of-the-art ASR system on the accented data and demonstrate that the character error rate (CER) strongly varies among these accent groups, even if i-vector speaker adaptation is used. Finally, we show that model selection based on the prediction of our bLSTM accent classifier can yield up to 7.6% CER reduction for accented speech.


 DOI: 10.21437/Interspeech.2019-2737

Cite as: Weninger, F., Sun, Y., Park, J., Willett, D., Zhan, P. (2019) Deep Learning Based Mandarin Accent Identification for Accent Robust ASR. Proc. Interspeech 2019, 510-514, DOI: 10.21437/Interspeech.2019-2737.


@inproceedings{Weninger2019,
  author={Felix Weninger and Yang Sun and Junho Park and Daniel Willett and Puming Zhan},
  title={{Deep Learning Based Mandarin Accent Identification for Accent Robust ASR}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={510--514},
  doi={10.21437/Interspeech.2019-2737},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2737}
}