8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Two-Stage System for Robust Neutral/Lombard Speech Recognition

Hynek Bořil (1), Petr Fousek (1), Harald Höge (2)

(1) Czech Technical University in Prague, Czech Republic
(2) Siemens AG, Germany

Performance of current speech recognition systems is significantly deteriorated when exposed to strongly noisy environment. It can be attributed to background noise and Lombard effect (LE). Attempts for LE-robust systems often display a tradeoff between LE-specific improvements and the portability to neutral speech. Therefore, towards LE-robust recognition, it seems effective to use a set of conditions-dedicated subsystems driven by a condition classifier, rather than attempting for one universal recognizer.

Presented paper focuses on a design of a two-stage recognition system (TSR) comprising talking style classifier (neutral/LE) followed by two style-dedicated recognizers differing in input features. First, the binary neutral/LE classifier is built, with a particular interest in developing suitable features for the classification. Second, performance of common speech features (MFCC, PLP), LE-robust features (Expolog) and newly proposed features is compared in neutral/LE digit recognition tasks. In addition, robustness to the changes of average speech pitch and various noise backgrounds is evaluated. Third, the TSR is built, employing two recognizers, each using style-specific features. Comparison of the proposed system with either neutral-specific or LE-specific recognizer on a joint neutral/LE speech shows an improvement 6.5→4.2 % WER on neutral and 48.1→28.4 % WER on LE Czech utterances.

Full Paper

Bibliographic reference.  Bořil, Hynek / Fousek, Petr / Höge, Harald (2007): "Two-stage system for robust neutral/lombard speech recognition", In INTERSPEECH-2007, 1074-1077.