Sixth International Conference on Spoken Language Processing
October 16-20, 2000
Quality Improvement of PSOLA Analysis-Synthesis Using Partial Zero-Phase Conversion
Nobuaki Minematsu (1), Seiichi Nakagawa (2)
(1) Graduate School of Engineering, University of Tokyo, Japan
This paper discusses two issues of the quality improvement
of F0 modified speech based upon PSOLA analysissynthesis.
Previous studies pointed out that the location
of a window of PSOLA influences the quality of
synthesized speech and one of them claimed that the center
of a window should be located at a pitch pulse in
source waveforms. However, pitch pulse detection sometimes
fails due to undesired acoustic events. In this paper,
several methods are experimentally examined to reduce
pitch pulse detection errors. Even when the detection
is done correctly, F0 modified re-synthesized speech sometimes
causes "echoes" in the re-arranged waveforms. This
is mainly caused by a pitch pulse with small sharpness or
by that with two relatively high pulses, not pitch pulses,
before and after it. To suppress the echoes with little loss
of naturalness, partial zero/ð-phase conversion is proposed
here. Experiments show the high validity of the proposed
methods in improving the quality of re-synthesized speech.
(2)Department of Information and Computer Sciences, Toyohashi University of Technology, Japan
- H. Kawai et al., "A study of a text-to-speech system based
on waveform splicing," Technical report of IEICE, SP93-9,
pp.49-54 (1993, in Japanese).
- Y. Arai et al., "A study on the optimal window position
to extract pitch waveforms," Technical report of IEICE,
SP95-8, pp.53-59 (1995, in Japanese).
Minematsu, Nobuaki / Nakagawa, Seiichi (2000):
"Quality improvement of PSOLA analysis-synthesis using partial zero-phase conversion",
In ICSLP-2000, vol.2, 779-782.