5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

On Frequency Averaging For Spectral Analysis In Speech Recognition

Climent Nadeu, Felix Galindo, Jaume Padrell

Universitat Politecnica de Catalunya, Spain

Many speech recognition systems use logarithmic filter-bank energies or a linear transformation of them to represent the speech signal. Usually, each of those energies is routinely computed as a weighted average of the periodogram samples that lie in the corresponding frequency band. In this work, we attempt to gain an insight into the statistical properties of the frequency-averaged periodogram (FAP) from which those energies are samples. Thus, we have shown that the FAP is statistically and asymptotically equivalent to a multiwindow estimator that arises from the Thomson[HEX 146]s optimization approach and uses orthogonal sinusoids as windows. The FAP and other multiwindow estimators are tested in a speech recognition application, observing the influence of several design factors. Particularly, a technique that is computationally simple like the FAP[HEX 146]s one, and which is equivalent to use multiple cosine windows, appears as an alternative to be taken into consideration.

Full Paper

Bibliographic reference.  Nadeu, Climent / Galindo, Felix / Padrell, Jaume (1998): "On frequency averaging for spectral analysis in speech recognition", In ICSLP-1998, paper 1135.