7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Spectral Subtraction in Noisy Environments Applied to Speaker Adaptation Based on HMM Sufficient Statistics

Shingo Yamade, Kanako Matsunami, Akira Baba, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano

Nara Institute of Science and Technology, Japan

Noise and speaker adaptation techniques are essential to realize robust speech recognition in real noisy environments . In this paper, we applied spectral subtraction to an unsupervised speaker adaptation algorithm in noisy environments. The adaptation algorithm consists of the following five steps. (1) Spectral subtraction is carried out for noise added database. (2) Noise matched acoustic models are trained by using noise added speech database. (3) HMM sufficient statistics for each speaker are calculated from noise added speech database, and stored. (4) According to one arbitrary utterance, speakers close to a test speaker are selected by using speaker GMMs. (5) Speaker adapted acoustic models are constructed from HMM sufficient statistics of the selected speakers. We evaluated our unsupervised speaker adaptation algorithm in noisy environments in the 20k dictation task. The recognition experiments show that our speaker adapted acoustic model can achieve 82% word accuracy in 20dB SNR, which is about 6% higher than that of the noise matched models trained by Forward-Backward algorithm.

We also investigated the robustness of the adapted models in various SNR conditions. Integration with the supervised MLLR is also examined.

Full Paper

Bibliographic reference.  Yamade, Shingo / Matsunami, Kanako / Baba, Akira / Lee, Akinobu / Saruwatari, Hiroshi / Shikano, Kiyohiro (2002): "Spectral subtraction in noisy environments applied to speaker adaptation based on HMM sufficient statistics", In ICSLP-2002, 1045-1048.