ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Real-time voice conversion using artificial neural networks with rectified linear units

Elias Azarov, Maxim Vashkevich, Denis Likhachov, Alexander Petrovsky

This paper presents an approach to parametric voice conversion that can be used in real-time entertainment applications. The approach is based on spectral mapping using an artificial neural network (ANN) with rectified linear units (ReLU). To overcome the oversmoothing problem a special network configuration is proposed that utilizes temporal states of the speaker. The speech is represented using the harmonic plus noise model. The parameters of the model are estimated using instantaneous harmonic parameters. Using objective and subjective measures the proposed voice conversion technique is compared to the main alternative approaches.


doi: 10.21437/Interspeech.2013-113

Cite as: Azarov, E., Vashkevich, M., Likhachov, D., Petrovsky, A. (2013) Real-time voice conversion using artificial neural networks with rectified linear units. Proc. Interspeech 2013, 1032-1036, doi: 10.21437/Interspeech.2013-113

@inproceedings{azarov13b_interspeech,
  author={Elias Azarov and Maxim Vashkevich and Denis Likhachov and Alexander Petrovsky},
  title={{Real-time voice conversion using artificial neural networks with rectified linear units}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={1032--1036},
  doi={10.21437/Interspeech.2013-113}
}