This paper investigates the combination of evidence coming from different frequency channels obtained filtering the speech signal at different auditory and modulation frequencies. In our previous work , we showed that combination of classifiers trained on different ranges of modulation frequencies is more effective if performed in sequential (hierarchical) fashion. In this work we verify that combination of classifiers trained on different ranges of auditory frequencies is more effective if performed in parallel fashion. Furthermore we propose an architecture based on neural networks for combining evidence coming from different auditorymodulation frequency sub-bands that takes advantages of previous findings. This reduces the final WER by 6.2% (from 45.8% to 39.6%) w.r.t the single classifier approach in a LVCSR task.
Bibliographic reference. Valente, Fabio / Hermansky, Hynek (2008): "On the combination of auditory and modulation frequency channels for ASR applications", In INTERSPEECH-2008, 2242-2245.