EUROSPEECH 2003 - INTERSPEECH 2003
TRAP based ASR attempts to extract information from rather long (as long as 1 s) and narrow(one critical-band) patches (temporal patterns) from time-frequency plane. We investigate the effect of combining temporal patterns of logarithmic critical-band energies from several adjacent bands. The frequency context is gradually increased from one critical-band to several critical-bands by using temporal patterns jointly from adjacent bands as input to the class-posterior estimators. We show that up to three critical-bands of frequency context is required for achieving higher recognition performance. This work also indicates that local bands interaction is important for improved speech recognition performance.
Bibliographic reference. Jain, Pratibha / Hermansky, Hynek (2003): "Beyond a single critical-band in TRAP based ASR", In EUROSPEECH-2003, 437-440.