INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Robust In-Car Spelling Recognition - A Tandem BLSTM-HMM Approach

Martin Wöllmer (1), Florian Eyben (1), Björn Schuller (1), Yang Sun (1), Tobias Moosmayr (2), Nhu Nguyen-Thien (3)

(1) Technische Universität München, Germany
(2) BMW Group, Germany
(3) Continental Automotive GmbH, Germany

As an intuitive hands-free input modality automatic spelling recognition is especially useful for in-car human-machine interfaces. However, for today’s speech recognition engines it is extremely challenging to cope with similar sounding spelling speech sequences in the presence of noises such as the driving noise inside a car. Thus, we propose a novel Tandem spelling recogniser, combining a Hidden Markov Model (HMM) with a discriminatively trained bidirectional Long Short-Term Memory (BLSTM) recurrent neural net. The BLSTM network captures long-range temporal dependencies to learn the properties of in-car noise, which makes the Tandem BLSTM-HMM robust with respect to speech signal disturbances at extremely low signal-to-noise ratios and mismatches between training and test noise conditions. Experiments considering various driving conditions reveal that our Tandem recogniser outperforms a conventional HMM by up to 33%.

Full Paper

Bibliographic reference.  Wöllmer, Martin / Eyben, Florian / Schuller, Björn / Sun, Yang / Moosmayr, Tobias / Nguyen-Thien, Nhu (2009): "Robust in-car spelling recognition - a tandem BLSTM-HMM approach", In INTERSPEECH-2009, 2507-2510.