5th International Conference on Spoken Language Processing
Over frames of short time duration, filtered speech may be described as a finite linear combination of sinusoidal components. In the case of a frame of voiced speech the frequencies are considered to be harmonics of a fundamental frequency. It can be assumed further that the speech samples are observed in additive white noise of zero mean, resulting in a standard signal-plus-noise model. This model has a nonlinear dependence on the frequencies of the sinusoids but is linear in their coefficients. We use subspace line spectral estimation methods of Pisarenko and Prony type to estimate the frequencies and use the results in voiced-unvoiced classification and pitch estimation, followed by analysis of the speech waveform into its sinusoidal components.
Bibliographic reference. Malik, Najam / Holmes, W. Harvey (1998): "Speech analysis by subspace methods of spectral line estimation", In ICSLP-1998, paper 1026.