Connecting automatic speech recognizers (ASRs) and language analyzers is difficult since they may be based on differences in part-of-speech (POS) systems; the latter cannot directly analyze the outputs of the former. In addition, in unsegmented languages such as Japanese, the ASR outputs are likely to have different word segmentation from that of the language analyzer inputs because they are individually developed.
A conventional approach is to generate raw texts from the ASR outputs and re-analyze them using a morphological analyzer. However, if the ASR outputs contain recognition errors, the morphological analyzer incorrectly analyzes them even though they contain correctly recognized words.
To avoid this problem, we propose a morpheme conversion method that directly converts ASR outputs into morpheme sequences suitable for the language analyzers. Our experiments show that morpheme conversion is more robust than the conventional approach against recognition errors.
Bibliographic reference. Imamura, Kenji / Izumi, Tomoko / Sadamitsu, Kugatsu / Saito, Kuniko / Kobashikawa, Satoshi / Masataki, Hirokazu (2011): "Morpheme conversion for connecting speech recognizer and language analyzers in unsegmented languages", In INTERSPEECH-2011, 1405-1408.