11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

A Phoneme Recognition Framework Based on Auditory Spectro-Temporal Receptive Fields

Samuel Thomas, Kailash Patil, Sriram Ganapathy, Nima Mesgarani, Hynek Hermansky

Johns Hopkins University, USA

In this paper we propose to incorporate features derived using spectro-temporal receptive fields (STRFs) of neurons in the auditory cortex for the task of phoneme recognition. Each of these STRFs is tuned to different auditory frequencies, scales and modulation rates. We select different sets of STRFs which are specific for phonemes in different broad phonetic classes (BPC) of sounds. These STRFs are then used as spectro-temporal filters on spectrograms of speech to extract features for phoneme recognition. For the phoneme recognition task on the TIMIT database, the proposed features show an improvement of about 5% over conventional feature extraction techniques.

Full Paper

Bibliographic reference.  Thomas, Samuel / Patil, Kailash / Ganapathy, Sriram / Mesgarani, Nima / Hermansky, Hynek (2010): "A phoneme recognition framework based on auditory spectro-temporal receptive fields", In INTERSPEECH-2010, 2458-2461.