Third International Conference on Spoken Language Processing (ICSLP 94)
We propose a composite signal model whose general form is valid for both the glottal pulse and the speech signal. The model consists of two linear autoregressive sub-models. The two submodels are respectively fitted to the open and return phase components of the glottal pulse or speech signal. In the case of the glottal pulse, the orders of the two sub-models are equal to 2 and 1 respectively. In the case of the speech signal the orders are higher in order to take into account the effects of vocal tract resonance. The switch from one sub-model to the next occurs when the signal crosses a critical threshold. The advantage is that the number and positions of these thresholds are independent of the position and length of the analysis window. As a result, the optimal threshold position, i.e. the best possible segmentation into the open and return phase components, can be found automatically by means of a conventional optimizer. Results show that the proposed model enables the glottal pulse to be segmented automatically and the sub-models to be fitted from within an excitation-asynchronously positioned analysis window. Similarly, when applied to the speech signal, the model automatically provides glottis cycle lengths, the open and return phase components of the speech signal and the open and return phase formant frequencies inside an excitation-asynchronously positioned analysis window.
Bibliographic reference. Schoentgen, Jean (1994): "Self excited threshold auto-regressive models of the glottal pulse and the speech signal", In ICSLP-1994, 1063-1066.