We discuss the use of low-dimensional physical models of the voice source for speech coding and processing applications. A class of waveform-adaptive dynamic glottal models and parameter tracking procedures are illustrated. The model and analysis procedures are assessed by addressing signal transformations on recorded speech, achievable by fitting the model to the data, and then acting on the physically-oriented parameters of the voice source. The class of models proposed provides in principle a tool for both the estimation of glottal source signals, and the encoding of the speech signal for transformation purposes. The application of this model to time stretching and to frequency control (pitch shifting) is also illustrated. The experiments show that copy synthesis is perceptually almost indistin- guishable form the target, and that time stretching and pitch extrapolation effects can be obtained by simple control strategies.
Index Terms: speech synthesis, glottal modeling, speech coding, physical modeling
target speech (8 kHz, 16 bit, mono)
estimated glottal flow derivative (synthesis after training)
copy synthesis (convolution of the estimated glottal flow derivative with the estimated t.v. vocal tract filter)
copy synthesis with time axis compression
copy synthesis with time axis expansion
copy synthesis with pitch shifting (a)
copy synthesis with pitch shifting (b)
Bibliographic reference. Drioli, Carlo / Calanca, Andrea (2012): "Speech modeling and processing by low-dimensional dynamic glottal models", In INTERSPEECH-2012, 1608-1611.