Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech

Li-Wei Chen, Hung-Yi Lee, Yu Tsao


This paper focuses on using voice conversion (VC) to improve the speech intelligibility of surgical patients who have had parts of their articulators removed. Due to the difficulty of data collection, VC without parallel data is highly desired. Although techniques for unparallel VC — for example, CycleGAN — have been developed, they usually focus on transforming the speaker identity, and directly transforming the speech of one speaker to that of another speaker and as such do not address the task here. In this paper, we propose a new approach for unparallel VC. The proposed approach transforms impaired speech to normal speech while preserving the linguistic content and speaker characteristics. To our knowledge, this is the first end-to-end GAN-based unsupervised VC model applied to impaired speech. The experimental results show that the proposed approach outperforms CycleGAN.


 DOI: 10.21437/Interspeech.2019-1265

Cite as: Chen, L., Lee, H., Tsao, Y. (2019) Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech. Proc. Interspeech 2019, 719-723, DOI: 10.21437/Interspeech.2019-1265.


@inproceedings{Chen2019,
  author={Li-Wei Chen and Hung-Yi Lee and Yu Tsao},
  title={{Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={719--723},
  doi={10.21437/Interspeech.2019-1265},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1265}
}