INTERSPEECH 2008
9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Distinctive Feature Fusion for Recognition of Australian English Consonants

Trent W. Lewis, David M. W. Powers

Flinders University, Australia

Audio-Visual Automatic Speech Recognition offers to make speech recognition possible in noisy environments. Early and late fusion approaches dominate the field but may ignore linguistically relevant features. Distinctive features offer an alternative unit for fusion and research has shown that this is feasible on subsets of phonemes [1]. This paper outlines two extended models, multiclass and binary, and results suggest that it is possible to achieve a 20dB gain over audio-only recognition in low SNR environments.

Reference

  1. T. Lewis and D. Powers, "Distinctive feature fusion for improved audio-visual phoneme recognition," in The Eighth International Symposium on Signal Proocessing and Its Applications, A. Bouzerdoum and A. Beghdadi, Eds. Sydney, Australia: IEEE, 2005.

Full Paper

Bibliographic reference.  Lewis, Trent W. / Powers, David M. W. (2008): "Distinctive feature fusion for recognition of australian English consonants", In INTERSPEECH-2008, 2671-2674.