Workshop on the Auditory Basis of Speech Perception

Keele University, UK
July 15-19, 1996

Channel Normalisation by Using RASTA Filtering and the Dynamic Cepstrum for Automatic Speech Recognition over the Phone

Peter-Pal Boda (l), Johan de Veth (2), Louis Boves (2,3)

(1) Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Espoo, Finland
(2) Department of Language and Speech, University of Nijmegen, Nijmegen, The Netherlands
(3) KPN Research, Leidschendam, The Netherlands

Human auditory perception is perfectly capable to deal with time-invariant linear filter effects, such as those introduced by telephone handsets and telephone channels. We compared two different schemes for modeling human auditory time-frequency masking: RASTA filtering and the dynamic cepstrum representation (DCR). We used a small set of context-independent phone hidden Markov models for a recognition task of connected digit strings over the telephone. We found that RASTA filtering out-performed the Gaussian DCR approach, despite the fact that RASTA represents a more crude approximation of human forward masking. Our results may be influenced by the choice of the mel-frequency cepstral representation that we used. The superiour performance of the RASTA technique may also be explained by the fact that the frequency response of the RASTA filter is better matched to the region of modulation frequencies where human auditory perception is most sensitive.

Full Paper

Bibliographic reference.  Boda, Peter-Pal / Veth, Johan de / Boves, Louis (1996): "Channel normalisation by using RASTA filtering and the dynamic cepstrum for automatic speech recognition over the phone", In ABSP-1996, 317-320.