14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Context-Dependent Modeling and Speaker Normalization Applied to Reservoir-Based Phone Recognition

Fabian Triefenbach, Azarakhsh Jalalvand, Kris Demuynck, Jean-Pierre Martens

Universiteit Gent, Belgium

Reservoir Computing (RC) has recently been introduced as an interesting alternative for acoustic modeling. For phone and continuous digit recognition, the reservoir approach obtained quite promising results. In this work, we further elaborate this concept by porting some well-known techniques used to enhance recognition rates of GMM-based models to Reservoir Computing. In particular, we introduce context-dependent (CD) triphone states to model co-articulation and pronunciation mismatches arising from an imperfect lexicon. We also propose to incorporate two speaker normalization methods in the feature space, namely mean & variance normalization and vocal tract length normalization. The impact of the investigated techniques is studied in the context of phone recognition on the TIMIT corpus. Our CD-RC-HMM hybrid yields a speaker-independent phone error rate (PER) of 22% and a speakerdependent PER of 20.5%. By combining GMM and RC-based likelihoods at the state level, these scores can be reduced further.

Full Paper

Bibliographic reference.  Triefenbach, Fabian / Jalalvand, Azarakhsh / Demuynck, Kris / Martens, Jean-Pierre (2013): "Context-dependent modeling and speaker normalization applied to reservoir-based phone recognition", In INTERSPEECH-2013, 3342-3346.