Phone Aware Nearest Neighbor Technique Using Spectral Transition Measure for Non-Parallel Voice Conversion

Nirmesh J. Shah, Hemant A. Patil


Nearest Neighbor (NN)-based alignment techniques are popular in non-parallel Voice Conversion (VC). The performance of NN-based alignment improves with the information about phone boundary. However, estimating the exact phone boundary is a challenging task. If text corresponding to the utterance is available, the Hidden Markov Model (HMM) can be used to identify the phone boundaries. However, it requires a large amount of training data that is difficult to collect in realistic VC scenarios. Hence, we propose to exploit a Spectral Transition Measure (STM)-based alignment technique that does not require apriori training data. The idea behind STM is that neurons in the auditory or visual cortex respond strongly to the transitional stimuli compared to the steady-state stimuli. The phone boundaries estimated using the STM algorithm are then applied to the NN technique to obtain the aligned spectral features of the source and target speakers. Proposed STM+NN alignment technique is giving on an average 13.67% relative improvement in phonetic accuracy (PA) compared to the NN-based alignment technique. The improvement in %PA after alignment has positively reflected in the better performance in terms of speech quality and speaker similarity (in particular, a relative improvement of 13.63% and 13.26% , respectively) of the converted voice.


 DOI: 10.21437/Interspeech.2019-1504

Cite as: Shah, N.J., Patil, H.A. (2019) Phone Aware Nearest Neighbor Technique Using Spectral Transition Measure for Non-Parallel Voice Conversion. Proc. Interspeech 2019, 639-643, DOI: 10.21437/Interspeech.2019-1504.


@inproceedings{Shah2019,
  author={Nirmesh J. Shah and Hemant A. Patil},
  title={{Phone Aware Nearest Neighbor Technique Using Spectral Transition Measure for Non-Parallel Voice Conversion}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={639--643},
  doi={10.21437/Interspeech.2019-1504},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1504}
}