14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Real-Time and Non-Real-Time Voice Conversion Systems with Web Interfaces

Elias Azarov, Maxim Vashkevich, Denis Likhachov, Alexander Petrovsky

BSUIR, Belarus

Two speech processing systems have been developed for real-time and non-real-time voice conversion. Using the real-time processing the user can apply conversion during voice over IP (VoIP) calls imitating identity of a specified target speaker. Non-real-time processing system converts prerecorded audio books read by a professional reader imitating voice of the user. Both systems require some speech samples of the user for training. The training procedures are similar for both systems however the user is considered as a source speaker in the first case and as a target speaker in the second. For parametric representation of speech we use a speech model based on instantaneous harmonic parameters with multicomponent sinusoidal excitation. The voice conversion itself is made using artificial neural networks (ANN) with rectified linear units. Here we demonstrate implementations of the voice conversion systems with dedicated web interfaces and iPhone application.

Full Paper

Bibliographic reference.  Azarov, Elias / Vashkevich, Maxim / Likhachov, Denis / Petrovsky, Alexander (2013): "Real-time and non-real-time voice conversion systems with web interfaces", In INTERSPEECH-2013, 2662-2663.