Two speech processing systems have been developed for real-time and non-real-time voice conversion. Using the real-time processing the user can apply conversion during voice over IP (VoIP) calls imitating identity of a specified target speaker. Non-real-time processing system converts prerecorded audio books read by a professional reader imitating voice of the user. Both systems require some speech samples of the user for training. The training procedures are similar for both systems however the user is considered as a source speaker in the first case and as a target speaker in the second. For parametric representation of speech we use a speech model based on instantaneous harmonic parameters with multicomponent sinusoidal excitation. The voice conversion itself is made using artificial neural networks (ANN) with rectified linear units. Here we demonstrate implementations of the voice conversion systems with dedicated web interfaces and iPhone application.
Bibliographic reference. Azarov, Elias / Vashkevich, Maxim / Likhachov, Denis / Petrovsky, Alexander (2013): "Real-time and non-real-time voice conversion systems with web interfaces", In INTERSPEECH-2013, 2662-2663.