Assessing the reading level of web pages

Sarah E. Petersen, Mari Ostendorf

Reading is an important part of educational development. However, finding appropriate reading material for all students can be difficult and time consuming for teachers. Our goal is to automate the task of assessing the reading level of text to enable teachers to more effectively take advantage of the large amounts of text available today on the World Wide Web. Reading level assessment tools already exist for clean corpora such as books and magazine articles. This paper presents extensions of a particular set of tools to handle web pages returned by a standard search engine, including a step that pre-filters web pages to eliminate "junk" pages with little or no text. Results of applying the reading level detectors to web pages are manually evaluated by elementary school teachers, the intended audience for these tools. The tools work well for grades 4 and 5, with room for improvement in grades 2 and 3.

