INTERSPEECH 2004 - ICSLP
The bandwidth for telephony is generally defined to be from 300--3400 Hz. This bandwidth restriction has a noticeable effect on speech quality. We present an algorithm which recovers the missing highband parts from telephone speech. We describe an MMSE estimator using hard/soft-classification to create the missing highband spectrum envelope. The classification is motivated by acoustic phonetics: voiced vowels and consonants, and unvoiced phonemes demonstrate different characteristic spectra. The classification also captures gender differences. A hard classification on phoneme characteristic parameters, such as a voicing degree and a pitch lag, reduces the MMSE of the highband spectrum envelope estimates. An estimator using HMM-based soft-classification can further bring down the estimated highband spectrum distortion by taking the time evolution of the spectra into consideration. Objective measures (mean log-spectrum distortion) and spectrograms confirm the improvement noted in informal subjective tests.
Bibliographic reference. Qian, Yasheng / Kabal, Peter (2004): "Highband spectrum envelope estimation of telephone speech using hard/soft-classification", In INTERSPEECH-2004, 2717-2720.