In this paper, we present an approach to generate speaker invariant features for automatic speech recognition (ASR) using the idea of spectral centre of gravity(CG). This is based on the observation that if two signals are delayed versions of one another, then their CG's also differ by the same amount. We exploit this idea to appropriately shift the mel warped log compressed spectra using the estimated CG to obtain speaker invariant features. The use of such speaker invariant or normalised features helps improve the recognition performance of speaker-independent ASR. We show that our proposed approach is computationally efficient when compared to a commonly used method of normalisation called Vocal Tract Length Normalisation (VTLN). We present normalisation results to show that the performance of our proposed approach is comparable to conventional VTLN and yet has the advantage of computational efficiency.
Bibliographic reference. Sanand, D. R. / Balaji, V. / Sandhya, Rani R. / Umesh, S. (2008): "Use of spectral centre of gravity for generating speaker invariant features for automatic speech recognition", In INTERSPEECH-2008, 2258-2261.