Speech and Language Technology in Education (SLaTE2007)

The Summit Inn, Farmington, PA, USA
October 1-3, 2007

Text Simplification for Language Learners: A Corpus Analysis

Sarah E. Petersen, Mari Ostendorf

Dept. of Computer Science, Dept. of Electrical Engineering University of Washington, Seattle, WA, USA

Simplified texts are commonly used by teachers and students in bilingual education and other language-learning contexts. These texts are usually manually adapted, and teachers say this is a timeconsuming and sometimes challenging task. Our goal is the development of tools to aid teachers by automatically proposing ways to simplify texts. As a first step, this paper presents a detailed analysis of a corpus of news articles and abridged versions written by a literacy organization in order to learn what kinds of changes people make when simplifying texts for language learners.

