11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Cross-Lingual and Multi-Stream Posterior Features for Low Resource LVCSR Systems

Samuel Thomas, Sriram Ganapathy, Hynek Hermansky

Johns Hopkins University, USA

We investigate approaches for large vocabulary continuous speech recognition (LVCSR) system for new languages or new domains using limited amounts of transcribed training data. In these low resource conditions, the performance of conventional LVCSR systems degrade significantly. We propose to train low resource LVCSR system with additional sources of information like annotated data from other languages (German and Spanish) and various acoustic feature streams (short-term and modulation features). We train multilayer perceptrons (MLPs) on these sources of information and use Tandem features derived from the MLPs for low resource LVCSR. In our experiments, the proposed system trained using only one hour of English conversational telephone speech (CTS) provides a relative improvement of 11% over the baseline system.

Full Paper

Bibliographic reference.  Thomas, Samuel / Ganapathy, Sriram / Hermansky, Hynek (2010): "Cross-lingual and multi-stream posterior features for low resource LVCSR systems", In INTERSPEECH-2010, 877-880.