The improvement achieved by changing the basis of speech recognition from words to morphs (various sub-word units) varies greatly across tasks and languages. We make an attempt to explore the source of this variability by the investigation of three LVCSR tasks corresponding to three speech genres of a highly agglutinative language. Novel, press conference and broadcast news transcription results are presented and compared to spontaneous speech recognition results in several experimental setups. A noticeable correlation is observed between an easily computable characteristic of various language speech recognition tasks and between the relative improvements due to (statistical) morph-based approaches.
Bibliographic reference. Mihajlik, Péter / Tarján, Balázs / Tüske, Zoltán / Fegyó, Tibor (2009): "Investigation of morph-based speech recognition improvements across speech genres", In INTERSPEECH-2009, 2687-2690.