EUROSPEECH 2001 Scandinavia
This paper presents a significant modification of our previously proposed speech recognizer's front-end based on perceptual harmonic cepstral coefficients. The spectrum is split into two frequency bands, which correspond to the harmonic and non-harmonic components. A weighting function, which depends both on the voiced/unvoiced/ transitional classification and on the prominence of harmonic structures, is applied to the harmonic band, and ensures accurate representation of the voiced and transitional speech spectral envelope. Conventional smoothed spectrum is used in the non-harmonic band. The mixed spectrum undergoes mel-scaled band-pass filtering, and the log-energy of the filters' output is discrete cosine transformed to produce cepstral coefficients. Experiments with Mandarin digit and E-set databases show significant recognition gains over plain perceptual harmonic cepstral coefficients and considerable gains over standard techniques.
Bibliographic reference. Gu, Liang / Rose, Kenneth (2001): "Split-band perceptual harmonic cepstral coefficients as acoustic features for speech recognition", In EUROSPEECH-2001, 583-586.