ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Efficient modeling of temporal structure of speech for applications in voice transformation

Binh Phu Nguyen, Masato Akagi

Aims of voice transformation are to change styles of given utterances. Most voice transformation methods process speech signals in a time-frequency domain. In the time domain, when processing spectral information, conventional methods do not consider relations between neighboring frames. If unexpected modifications happen, there are discontinuities between frames, which lead to the degradation of the transformed speech quality. This paper proposes a new modeling of temporal structure of speech to ensure the smoothness of the transformed speech for improving the quality of transformed speech in the voice transformation. In our work, we propose an improvement of the temporal decomposition (TD) technique, which decomposes a speech signal into event targets and event functions, to model the temporal structure of speech. The TD is used to control the spectral dynamics and to ensure the smoothness of transformed speech. We investigate the TD in two applications, concatenative speech synthesis and spectral voice conversion. Experimental results confirm the effectiveness of TD in terms of improving the quality of the transformed speech.


doi: 10.21437/Interspeech.2009-487

Cite as: Nguyen, B.P., Akagi, M. (2009) Efficient modeling of temporal structure of speech for applications in voice transformation. Proc. Interspeech 2009, 1631-1634, doi: 10.21437/Interspeech.2009-487

@inproceedings{nguyen09b_interspeech,
  author={Binh Phu Nguyen and Masato Akagi},
  title={{Efficient modeling of temporal structure of speech for applications in voice transformation}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={1631--1634},
  doi={10.21437/Interspeech.2009-487}
}