ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

On frequency averaging for spectral analysis in speech recognition

Climent Nadeu, Felix Galindo, Jaume Padrell

Many speech recognition systems use logarithmic filter-bank energies or a linear transformation of them to represent the speech signal. Usually, each of those energies is routinely computed as a weighted average of the periodogram samples that lie in the corresponding frequency band. In this work, we attempt to gain an insight into the statistical properties of the frequency-averaged periodogram (FAP) from which those energies are samples. Thus, we have shown that the FAP is statistically and asymptotically equivalent to a multiwindow estimator that arises from the Thomson[HEX 146]s optimization approach and uses orthogonal sinusoids as windows. The FAP and other multiwindow estimators are tested in a speech recognition application, observing the influence of several design factors. Particularly, a technique that is computationally simple like the FAP[HEX 146]s one, and which is equivalent to use multiple cosine windows, appears as an alternative to be taken into consideration.


doi: 10.21437/ICSLP.1998-541

Cite as: Nadeu, C., Galindo, F., Padrell, J. (1998) On frequency averaging for spectral analysis in speech recognition. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 1135, doi: 10.21437/ICSLP.1998-541

@inproceedings{nadeu98_icslp,
  author={Climent Nadeu and Felix Galindo and Jaume Padrell},
  title={{On frequency averaging for spectral analysis in speech recognition}},
  year=1998,
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 1135},
  doi={10.21437/ICSLP.1998-541}
}