INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Modeling Northern and Southern Varieties of Dutch for STT

Julien Despres (1), Petr Fousek (2), Jean-Luc Gauvain (2), Sandrine Gay (1), Yvan Josse (1), Lori Lamel (2), Abdel Messaoudi (2)

(1) Vecsys Research, France
(2) LIMSI, France

This paper describes how the Northern (NL) and Southern (VL) varieties of Dutch are modeled in the joint Limsi-Vecsys Research speech-to-text transcription systems for broadcast news (BN) and conversational telephone speech (CTS). Using the Spoken Dutch Corpus resources (CGN), systems were developed and evaluated in the 2008 N-Best benchmark. Modeling techniques that are used in our systems for other languages were found to be effective for the Dutch language, however it was also found to be important to have acoustic and language models, and statistical pronunciation generation rules adapted to each variety. This was in particular true for the MLP features which were only effective when trained separately for Dutch and Flemish. The joint submissions obtained the lowest WERs in the benchmark by a significant margin.

Full Paper

Bibliographic reference.  Despres, Julien / Fousek, Petr / Gauvain, Jean-Luc / Gay, Sandrine / Josse, Yvan / Lamel, Lori / Messaoudi, Abdel (2009): "Modeling northern and southern varieties of dutch for STT", In INTERSPEECH-2009, 96-99.