![]() |
ASR2000 - Automatic Speech Recognition: Challenges for the new MilleniumSeptember 18-20, 2000 |
![]() |
Two recognition tasks are discussed in which pre-processing based on amplitude modulation (AM) maps is compared with other feature extraction strategies. In the first task we show how the AM map representation can be used to segregate voiced speech signals from one another. The second shows how the AM representation can be used for robust digit recognition in additive noise.
Natural vowels from the TIMIT database are presented concurrently with a second vowel and recognised using a multilayer perceptron. AM map based pre-processing is compared with that of Parsons’ harmonic selection algorithm and a strategy using no noise reduction. The proposed feature extraction algorithm leads to an improvement in recognition equivalent to a 6 dB increase in signal-to-noise ratio (SNR) over the other algorithms.
Digits (from OGI Alphadigits) were presented in clean, in white noise and in rapidly varying high-pass/low-pass noise conditions. Recognition performance, based on an 8 state left-to-right hidden Markov model (HMM), is compared for conventional mel-scale cepstral coefficients (MFCCs), auditory filterbank output, and the spectra recovered from AM maps. For clean speech we obtain error rates of 6-8% for all three strategies but as the noise level increases recognition scores consistently show AM maps to be the more robust strategy.
Full Paper (PDF) Full Paper (Zipped Postscript)
Bibliographic reference. Meyer, G. F. / Edmonds, B. A. / Yang, D. / Ainsworth, William A. (2000): "Amplitude modulation maps for robust speech recognition", In ASR-2000, 168-174.