Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

A Front-End Using the Harmonicity Cue for Speech Enhancement in Loud Noise

Frédéric Berthommier, Hervé Glotin, Emmanuel Tessier

Institut de la Communication Parlée/INPG, Grenoble, France

We propose and test a technique for speech enhancement based on the computation of a harmonicity index, which is non linearly related to the SNR. We assume this method is close to "segregation" of speech and noise and it follows the aim of the CASA approach. To carry out the performance evaluation, we quantify the accuracy of reconstruction of the target speech source. We vary factors including the size of the time-frequency regions in which the enhancement process is applied and the use of demodulation. We conclude that these factors have little effect on reconstruction accuracy, but demodulation improves the reconstruction and a process applied in 4 sub-bands with 128 ms time frame-duration is satisfactory. Then, using a HMM/ANN model, we evaluate the recognition scores in comparison with those obtained with unprocessed noisy speech, J-RASTA-PLP pre-processing and training with a clean signal. A gain of 3-4dB is observed in loud noise with GWN, and 3dB with car noise, at WER=65%. We obtain the best gains after training with clean processed speech, but a significant gain is also obtained without such training.

