International Workshop on Spoken Language Translation (IWSLT) 2011

San Francisco, CA, USA
December 8-9, 2011

Resource-Rich Research on Natural Language Processing and Understanding

Junichi Tsujii

Microsoft Research Asia

Corpus-based NLP techniques have been intensively studied in the past two decades and become dominant in our fields, including machine translation, language modeling, dependency parsing, etc. However, the limitation of the corpus-based approach has also been becoming apparent. Any systems, such as a speech-translation system, etc., which have to deal with language as intelligently and robustly as human being should be able to treat semantic and pragmatic aspects of language. Since these two aspects are concerned with the relationships of language with the world (semantics) and the context (pragmatics), to observe and use language corpus alone would not solve problems related with these aspects. We have to introduce extra-linguistic elements in our paradigm which have been excluded as nonobservable from the corpus-based or the empirical approach. My talk focuses on how to exploit ontological resources and context-sensitive data and introduces some of our recent research.

Bibliographic reference.  Tsujii, Junichi (2011): "Resource-rich research on natural language processing and understanding", In IWSLT-2011 (abstract).