Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion

Nirmesh Shah, Maulik C. Madhavi, Hemant Patil


In the non-parallel Voice Conversion (VC) with the Iterative combination of Nearest Neighbor search step and Conversion step Alignment (INCA) algorithm, the occurrence of one-to-many and many-to-one pairs in the training data will deteriorate the performance of the stand-alone VC system. The work on handling these pairs during the training is less explored. In this paper, we establish the relationship via intermediate speaker-independent posteriorgram representation, instead of directly mapping the source spectrum to the target spectrum. To that effect, a Deep Neural Network (DNN) is used to map the source spectrum to posteriorgram representation and another DNN is used to map this posteriorgram representation to the target speaker’s spectrum. In this paper, we propose to use unsupervised Vocal Tract Length Normalization (VTLN)-based warped Gaussian posteriorgram features as the speaker-independent representations. We performed experiments on a small subset of publicly available Voice Conversion Challenge (VCC) 2016 database. We obtain the lower Mel Cepstral Distortion (MCD) values with the proposed approach compared to the baseline as well as the supervised phonetic posteriorgram feature-based speaker-independent representations. Furthermore, subjective evaluation gave relative improvement of 13.3% with the proposed approach in terms of Speaker Similarity (SS).


 DOI: 10.21437/Interspeech.2018-1712

Cite as: Shah, N., Madhavi, M.C., Patil, H. (2018) Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion. Proc. Interspeech 2018, 1968-1972, DOI: 10.21437/Interspeech.2018-1712.


@inproceedings{Shah2018,
  author={Nirmesh Shah and Maulik C. Madhavi and Hemant Patil},
  title={Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1968--1972},
  doi={10.21437/Interspeech.2018-1712},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1712}
}