14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Real-Time Voice Conversion Using Artificial Neural Networks with Rectified Linear Units

Elias Azarov, Maxim Vashkevich, Denis Likhachov, Alexander Petrovsky

BSUIR, Belarus

This paper presents an approach to parametric voice conversion that can be used in real-time entertainment applications. The approach is based on spectral mapping using an artificial neural network (ANN) with rectified linear units (ReLU). To overcome the oversmoothing problem a special network configuration is proposed that utilizes temporal states of the speaker. The speech is represented using the harmonic plus noise model. The parameters of the model are estimated using instantaneous harmonic parameters. Using objective and subjective measures the proposed voice conversion technique is compared to the main alternative approaches.

Full Paper

Bibliographic reference.  Azarov, Elias / Vashkevich, Maxim / Likhachov, Denis / Petrovsky, Alexander (2013): "Real-time voice conversion using artificial neural networks with rectified linear units", In INTERSPEECH-2013, 1032-1036.