In source-filter models of speech production, the residual signal . what remains after passing the speech signal through the inverse filter . contains important information for the generation of naturally sounding re-synthesized speech. Typically, the voiced regions of residual signals are regarded as a mixture of glottal pulse and noise. This paper introduces a novel approach to represent the noise component of voiced regions of residual signals through autoregressive filtering of multipulse sequences. The positions and amplitudes of the non-zero samples of these multipulse signals are optimized through a closed-loop procedure. The method in question is applied to excitation modeling in statistical parametric synthesis. Experimental results indicate that the use of multipulse-based noise component construction eliminates the necessity of run-time ad hoc procedures such as high-pass filtering and time modulation, common on excitation models for statistical parametric synthesizers, with no loss of synthesized speech quality.
Bibliographic reference. Maia, Ranniery / Zen, Heiga / Knill, Kate / Gales, M. J. F. / Buchholz, Sabine (2011): "Multipulse sequences for residual signal modeling", In INTERSPEECH-2011, 1833-1836.