2nd Workshop on Spoken Language Technologies for Under-Resourced Languages
Universiti Sains, Penang, Malaysia
Efficient large vocabulary continuous speech recognition of morphologically rich languages is a big challenge due to the rapid vocabulary growth. To improve the results various subword units - called as morphs - are applied as basic language elements. The improvements over the word baseline, however, are changing from negative to error rate halving across languages and tasks. In this paper we make an attempt to explore the source of this variability. Different LVCSR tasks of an agglutinative language are investigated in numerous experiments using full vocabularies. The improvement results are compared to pre-existing other language results, as well. Important correlations are found between the morph-based improvements and between the vocabulary growths and the corpus sizes.
Bibliographic reference. Tarján, Balázs / Mihajlik, Péter (2010): "On morph-based LVCSR improvements", In SLTU-2010, 10-16.