EUROSPEECH 2003 - INTERSPEECH 2003
We have developed a novel approach to speech feature extraction based on a modulation model of a band-pass signal. Speech is processed by a bank of band-pass filters. At the output of the band-pass filters the signal is subjected to a log-derivative operation which naturally decomposes the band-pass signal into analytic (called (dot-)(alpha) (t) + j (hat)(dot-)(alpha) ) and anti-analytic (called (dot-)(beta) (t) + j (hat)(dot-)(beta) ) components. The average instantaneous frequency (AIF) and average log-envelope (ALE) are then extracted as coarse features at the output of each filter. Further, refined features may also be extracted from the analytic and anti-analytic components (but not done in this paper). We then evaluated the Aurora 2 task where noise corruption is synthetic. For clean training, (compared to the mel-cepstrum front end, with 3 mixture HMM back-end,) our AIF/ALE front end achieves an average improvement of 13.97% with set A and 17.92% improvement with set B and -31.72% (negative) `improvement' with set C. The overall improvement in accuracy rates for clean training is 7.97%. Although the improvements are modest, the novelty of the front-end and its potential for future enhancements are our strengths.
Bibliographic reference. Wang, Yadong / Hansen, Jesse / Allu, Gopi Krishna / Kumaresan, Ramdas (2003): "Average instantaneous frequency (AIF) and average log-envelopes (ALE) for ASR with the Aurora 2 database", In EUROSPEECH-2003, 25-28.