10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Multi-Stream to Many-Stream: Using Spectro-Temporal Features for ASR

Sherry Y. Zhao, Suman Ravuri, Nelson Morgan


We report progress in the use of multi-stream spectro-temporal features for both small and large vocabulary automatic speech recognition tasks. Features are divided into multiple streams for parallel processing and dynamic utilization in this approach. For small vocabulary speech recognition experiments, the incorporation of up to 28 dynamically-weighted spectro-temporal feature streams along with MFCCs yields roughly 21% improvement on the baseline in low noise conditions and 47% improvement in noise-added conditions, a greater improvement on the baseline than in our previous work. A four stream framework yields a 14% improvement over the baseline in the large vocabulary low noise recognition experiment. These results suggest that the division of spectro-temporal features into multiple streams may be an effective way to flexibly utilize an inherently large number of features for automatic speech recognition.

Full Paper

Bibliographic reference.  Zhao, Sherry Y. / Ravuri, Suman / Morgan, Nelson (2009): "Multi-stream to many-stream: using spectro-temporal features for ASR", In INTERSPEECH-2009, 2951-2954.