![]() |
Workshop on the Auditory Basis of Speech PerceptionKeele University, UK |
![]() |
Human auditory perception is perfectly capable to deal with time-invariant linear filter effects, such as those introduced by telephone handsets and telephone channels. We compared two different schemes for modeling human auditory time-frequency masking: RASTA filtering and the dynamic cepstrum representation (DCR). We used a small set of context-independent phone hidden Markov models for a recognition task of connected digit strings over the telephone. We found that RASTA filtering out-performed the Gaussian DCR approach, despite the fact that RASTA represents a more crude approximation of human forward masking. Our results may be influenced by the choice of the mel-frequency cepstral representation that we used. The superiour performance of the RASTA technique may also be explained by the fact that the frequency response of the RASTA filter is better matched to the region of modulation frequencies where human auditory perception is most sensitive.
Bibliographic reference. Boda, Peter-Pal / Veth, Johan de / Boves, Louis (1996): "Channel normalisation by using RASTA filtering and the dynamic cepstrum for automatic speech recognition over the phone", In ABSP-1996, 317-320.