In this paper, we present state-of-the art concept tagging results on a new corpus for Polish SLU. For this language, it is the first large-scale corpus (กซ200 different concepts) which has been semantically annotated and will be made publicly available. Conditional Random Fields have proven to lead to best results for string-to-string translation problems. Using this approach, we achieve a concept error rate of 22.6% on an evaluation corpus. To additionally extract attribute values, a combination of a statistical and a rule-based approach is used leading to a CER of 30.2%.
Bibliographic reference. Lehnen, Patrick / Hahn, Stefan / Ney, Hermann / Mykowiecka, Agnieszka (2009): "Large-scale Polish SLU", In INTERSPEECH-2009, 2723-2726.