ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Multi-stream to many-stream: using spectro-temporal features for ASR

Sherry Y. Zhao, Suman Ravuri, Nelson Morgan

We report progress in the use of multi-stream spectro-temporal features for both small and large vocabulary automatic speech recognition tasks. Features are divided into multiple streams for parallel processing and dynamic utilization in this approach. For small vocabulary speech recognition experiments, the incorporation of up to 28 dynamically-weighted spectro-temporal feature streams along with MFCCs yields roughly 21% improvement on the baseline in low noise conditions and 47% improvement in noise-added conditions, a greater improvement on the baseline than in our previous work. A four stream framework yields a 14% improvement over the baseline in the large vocabulary low noise recognition experiment. These results suggest that the division of spectro-temporal features into multiple streams may be an effective way to flexibly utilize an inherently large number of features for automatic speech recognition.

doi: 10.21437/Interspeech.2009-747

Cite as: Zhao, S.Y., Ravuri, S., Morgan, N. (2009) Multi-stream to many-stream: using spectro-temporal features for ASR. Proc. Interspeech 2009, 2951-2954, doi: 10.21437/Interspeech.2009-747

  author={Sherry Y. Zhao and Suman Ravuri and Nelson Morgan},
  title={{Multi-stream to many-stream: using spectro-temporal features for ASR}},
  booktitle={Proc. Interspeech 2009},