Many speech recognition systems use logarithmic filter-bank energies or a linear transformation of them to represent the speech signal. Usually, each of those energies is routinely computed as a weighted average of the periodogram samples that lie in the corresponding frequency band. In this work, we attempt to gain an insight into the statistical properties of the frequency-averaged periodogram (FAP) from which those energies are samples. Thus, we have shown that the FAP is statistically and asymptotically equivalent to a multiwindow estimator that arises from the Thomson[HEX 146]s optimization approach and uses orthogonal sinusoids as windows. The FAP and other multiwindow estimators are tested in a speech recognition application, observing the influence of several design factors. Particularly, a technique that is computationally simple like the FAP[HEX 146]s one, and which is equivalent to use multiple cosine windows, appears as an alternative to be taken into consideration.
Cite as: Nadeu, C., Galindo, F., Padrell, J. (1998) On frequency averaging for spectral analysis in speech recognition. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 1135, doi: 10.21437/ICSLP.1998-541
@inproceedings{nadeu98_icslp, author={Climent Nadeu and Felix Galindo and Jaume Padrell}, title={{On frequency averaging for spectral analysis in speech recognition}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 1135}, doi={10.21437/ICSLP.1998-541} }