Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Perceptual Harmonic Cepstral Coefficients as the Front-End for Speech Recognition

Liang Gu, Kenneth Rose

Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA, USA

Perceptual harmonic cepstral coefficients (PHCC) are proposed as features to extract for speech recognition. Pitch estimation and classification into voiced, unvoiced, and transitional speech are performed by a spectro-temporal auto-correlation technique. A peak picking algorithm is then employed to precisely locate pitch harmonics. A weighting function, which depends on the classification and the pitch harmonics, is applied to the power spectrum and ensures accurate representation of the voiced speech spectral envelope. The harmonics weighted power spectrum undergoes mel-scaled band-pass filtering, and the logenergy of the filtersí output is discrete cosine transformed to produce cepstral coefficients. For perceptual considerations, within-filter cubic-root amplitude compression is applied to reduce amplitude variation without compromise of the gain invariance properties. Experiments show substantial recognition gains of PHCC over MFCC, with 48% and 15% error rate reduction for the Mandarin digit database and E-set, respectively.


Full Paper

Bibliographic reference.  Gu, Liang / Rose, Kenneth (2000): "Perceptual harmonic cepstral coefficients as the front-end for speech recognition", In ICSLP-2000, vol.1, 309-312.