8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Highband Spectrum Envelope Estimation of Telephone Speech Using Hard/Soft-Classification

Yasheng Qian, Peter Kabal

McGill University, Canada

The bandwidth for telephony is generally defined to be from 300--3400 Hz. This bandwidth restriction has a noticeable effect on speech quality. We present an algorithm which recovers the missing highband parts from telephone speech. We describe an MMSE estimator using hard/soft-classification to create the missing highband spectrum envelope. The classification is motivated by acoustic phonetics: voiced vowels and consonants, and unvoiced phonemes demonstrate different characteristic spectra. The classification also captures gender differences. A hard classification on phoneme characteristic parameters, such as a voicing degree and a pitch lag, reduces the MMSE of the highband spectrum envelope estimates. An estimator using HMM-based soft-classification can further bring down the estimated highband spectrum distortion by taking the time evolution of the spectra into consideration. Objective measures (mean log-spectrum distortion) and spectrograms confirm the improvement noted in informal subjective tests.

