ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Articulatory-based conversion of foreign accents with deep neural networks

Sandesh Aryal, Ricardo Gutierrez-Osuna

We present an articulatory-based method for real-time accent conversion using deep neural networks (DNN). The approach consists of two steps. First, we train a DNN articulatory synthesizer for the non-native speaker that estimates acoustics from contextualized articulatory gestures. Then we drive the DNN with articulatory gestures from a reference native speaker — mapped to the nonnative articulatory space via a Procrustes transform. We evaluate the accent-conversion performance of the DNN through a series of listening tests of intelligibility, voice identity and nonnative accentedness. Compared to a baseline method based on Gaussian mixture models, the DNN accent conversions were found to be 31% more intelligible, and were perceived more native-like in 68% of the cases. The DNN also succeeded in preserving the voice identity of the nonnative speaker.


doi: 10.21437/Interspeech.2015-145

Cite as: Aryal, S., Gutierrez-Osuna, R. (2015) Articulatory-based conversion of foreign accents with deep neural networks. Proc. Interspeech 2015, 3385-3389, doi: 10.21437/Interspeech.2015-145

@inproceedings{aryal15_interspeech,
  author={Sandesh Aryal and Ricardo Gutierrez-Osuna},
  title={{Articulatory-based conversion of foreign accents with deep neural networks}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={3385--3389},
  doi={10.21437/Interspeech.2015-145}
}