Rewriting syntactically complex English sentences in the FIRST project

Abstract: Simplification of syntactically complex sentences can be beneficial for both human readers with low reading skills and NLP applications that need to process them, but it is a fairly difficult task. One of the main challenges comes from the fact that syntactic parsers, which are the obvious tools for tackling this task, tend to perform quite poorly on syntactically complex sentences. In this talk, I will present a method that does not require the input of a parser. Instead it takes a text with part-of-speech information and combines a specially designed machine learning (ML) classifier with manually crafted rules for rewriting sentences which contain coordinated and subordinated clauses. The method was developed in the context of the FIRST project, a project which investigated methods for making texts more accessible to people with autism (

The underlying idea of the method is that coordination and subordination relations are explicitly marked by certain tokens, referred to in this research as signs of syntactic complexity. A literature review and corpus analysis was used to design an annotation scheme that encodes the roles of signs. A corpus was annotated using this scheme and used by our machine learning classifier. The output of the classifier is then taken by the rewriting component to perform the syntactic simplification. This component relies on 127 different rules for tackling sentences that contain subordination and 56 different rules for rewriting sentences that contain coordinated sentences. An evaluation of the approach will be presented.

This research was carried out jointly with Richard Evans and Iustin Dornescu from University of Wolverhampton, UK

