ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Real-time and non-real-time voice conversion systems with web interfaces

Elias Azarov, Maxim Vashkevich, Denis Likhachov, Alexander Petrovsky

Two speech processing systems have been developed for real-time and non-real-time voice conversion. Using the real-time processing the user can apply conversion during voice over IP (VoIP) calls imitating identity of a specified target speaker. Non-real-time processing system converts prerecorded audio books read by a professional reader imitating voice of the user. Both systems require some speech samples of the user for training. The training procedures are similar for both systems however the user is considered as a source speaker in the first case and as a target speaker in the second. For parametric representation of speech we use a speech model based on instantaneous harmonic parameters with multicomponent sinusoidal excitation. The voice conversion itself is made using artificial neural networks (ANN) with rectified linear units. Here we demonstrate implementations of the voice conversion systems with dedicated web interfaces and iPhone application.


Cite as: Azarov, E., Vashkevich, M., Likhachov, D., Petrovsky, A. (2013) Real-time and non-real-time voice conversion systems with web interfaces. Proc. Interspeech 2013, 2662-2663

@inproceedings{azarov13c_interspeech,
  author={Elias Azarov and Maxim Vashkevich and Denis Likhachov and Alexander Petrovsky},
  title={{Real-time and non-real-time voice conversion systems with web interfaces}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={2662--2663}
}