Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Quality Improvement of PSOLA Analysis-Synthesis Using Partial Zero-Phase Conversion

Nobuaki Minematsu (1), Seiichi Nakagawa (2)

(1) Graduate School of Engineering, University of Tokyo, Japan
(2)Department of Information and Computer Sciences, Toyohashi University of Technology, Japan

This paper discusses two issues of the quality improvement of F0 modified speech based upon PSOLA analysissynthesis. Previous studies[1][2] pointed out that the location of a window of PSOLA influences the quality of synthesized speech and one of them claimed that the center of a window should be located at a pitch pulse in source waveforms. However, pitch pulse detection sometimes fails due to undesired acoustic events. In this paper, several methods are experimentally examined to reduce pitch pulse detection errors. Even when the detection is done correctly, F0 modified re-synthesized speech sometimes causes "echoes" in the re-arranged waveforms. This is mainly caused by a pitch pulse with small sharpness or by that with two relatively high pulses, not pitch pulses, before and after it. To suppress the echoes with little loss of naturalness, partial zero/-phase conversion is proposed here. Experiments show the high validity of the proposed methods in improving the quality of re-synthesized speech.


  1. H. Kawai et al., "A study of a text-to-speech system based on waveform splicing," Technical report of IEICE, SP93-9, pp.49-54 (1993, in Japanese).
  2. Y. Arai et al., "A study on the optimal window position to extract pitch waveforms," Technical report of IEICE, SP95-8, pp.53-59 (1995, in Japanese).

