We present a comparative analysis of multi-modal user inputs with speech and pen gestures, together with their semantically equivalent uni-modal (speech only) counterparts. The multimodal interactions are derived from a corpus collected with a Pocket PC emulator in the context of navigation around Beijing. We devise a cross-modality integration methodology that interprets a multi-modal input and paraphrases it as a semantically equivalent, uni-modal input. Thus we generate parallel multimodal (MM) and unimodal (UM) corpora for comparative study. Empirical analysis based on class trigram perplexities shows two categories of data: (PPMM = PPUM) and (PPMM < PPUM). The former involves complementarity across modalities in expressing the user's intent, including occurrences of ellipses. The latter involves redundancy, which will be useful for handling recognition errors by exploring mutual reinforcements. We present explanatory examples of data in these two categories.
Bibliographic reference. Hui, Pui-Yu / Zhou, Zhengyu / Meng, Helen (2007): "Complementarity and redundancy in multimodal user inputs with speech and pen gestures", In INTERSPEECH-2007, 2205-2208.