Third International Conference on Spoken Language Processing (ICSLP 94)
In noisy environment, performance of speech recognition system trained in quiet environment is degraded. We propose a new word recognition method using an acoustic phonetic variability model for Lombard effect that is one of the reasons for this degradation. In this method, difference between a spectral envelope of normal speech and that of Lombard speech is represented by the acoustic phonetic variability model, which are comprised of a non-linear warping function on spectral frequency domain for formant shift and spectral filters for changes of formant bandwidths and spectral tilt. Each model is trained with Lombard speech and provided for a sub-phoneme HMM. In Lombard speech recognition, the HMMs are modified with the acoustic-phonetic variability models, and the duration parameters are modified to compensate the word duration changes by Lombard effect. Recognition experiments without contamination-by-noise were conducted. The Lombard speech data was comprised of isolated 100 words spoken by 5 males hearing 90dB(SPL) pink noise through headphones. The recognition rate was 98.6% with this method, and 88.4% without the method.
Bibliographic reference. Suzuki, Tadashi / Nakqjima, Kunio / Abe, Yoshiharu (1994): "Isolated word recognition using models for acoustic phonetic variability by lombard effect", In ICSLP-1994, 999-1002.