4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
To preserve shape-invariance when pitch or time-scale modifying sinusoidally modelled voiced speech, the phases of the sinusoids used to model the glottal excitation are made to add coherently at estimated excitation points. Previous methods achieve this by estimating excitation phases at synthesis frame boundaries, disregarding the frequency modulation that may occur between the frame boundary and the nearest modified excitation point. This approximation can produce a significant mis-alignment of the excitation phases, leading to distortion of the temporal structure of the synthetic speech. In this paper, a shape-invariant technique is proposed which aligns the excitation phases at excitation points, whilst allowing for variations in the frequency of the sinusoidal components.
Bibliographic reference. Pollard, M. P. / Cheetham, B. M. G. / Goodyear, C. C. / Edgington, Mike D. / Lowry, A. (1996): "Enhanced shape-invariant pitch and time-scale modification for concatenative speech synthesis", In ICSLP-1996, 1433-1436.