EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

Learning Discriminative Temporal Patterns in Speech: Development of Novel TRAPS-Like Classifiers

Barry Chen (1), Shuangyu Chang (2), Sunil Sivadas (3)

(1) International Computer Science Institute, USA
(2) University of California at Berkeley, USA
(3) Oregon Health & Science University, USA

Motivated by the temporal processing properties of human hearing, researchers have explored various methods to incorporate temporal and contextual information in ASR systems. One such approach, TempoRAl PatternS (TRAPS), takes temporal processing to the extreme and analyzes the energy pattern over long periods of time (500 ms to 1000 ms) within separate critical bands of speech. In this paper we extend the work on TRAPS by experimenting with two novel variants of TRAPS developed to address some shortcomings of the TRAPS classifiers. Both the Hidden Activation TRAPS (HATS) and Tonotopic Multi- Layer Perceptrons (TMLP) require 84% less parameters than TRAPS but can achieve significant phone recognition error reduction when tested on the TIMIT corpus under clean, reverberant, and several noise conditions. In addition, the TMLP performs training in a single stage and does not require critical band level training targets. Using these variants, we find that approximately 20 discriminative temporal patterns per critical band is sufficient for good recognition performance. In combination with a conventional PLP system, these TRAPS variants achieve significant additional performance improvements.

Full Paper

Bibliographic reference.  Chen, Barry / Chang, Shuangyu / Sivadas, Sunil (2003): "Learning discriminative temporal patterns in speech: development of novel TRAPS-like classifiers", In EUROSPEECH-2003, 853-856.