This paper describes how the Northern (NL) and Southern (VL) varieties of Dutch are modeled in the joint Limsi-Vecsys Research speech-to-text transcription systems for broadcast news (BN) and conversational telephone speech (CTS). Using the Spoken Dutch Corpus resources (CGN), systems were developed and evaluated in the 2008 N-Best benchmark. Modeling techniques that are used in our systems for other languages were found to be effective for the Dutch language, however it was also found to be important to have acoustic and language models, and statistical pronunciation generation rules adapted to each variety. This was in particular true for the MLP features which were only effective when trained separately for Dutch and Flemish. The joint submissions obtained the lowest WERs in the benchmark by a significant margin.
Bibliographic reference. Despres, Julien / Fousek, Petr / Gauvain, Jean-Luc / Gay, Sandrine / Josse, Yvan / Lamel, Lori / Messaoudi, Abdel (2009): "Modeling northern and southern varieties of dutch for STT", In INTERSPEECH-2009, 96-99.