Logo RUB

Ruhr-Universität Bochum

Sprachwissenschaftliches Institut

  • Startseite
  • Aktuelles
  • Blog
  • Newsletter
  • Kontakt
  • Ansprechpartner
  • Anreise
  • Personen
  • Forschung
  • Professuren
  • Projekte
  • Tools & Ressourcen
  • Vortragsreihe
  • Arbeitsberichte
  • Studium
  • Beratung
  • Für Studierende
  • Für Interessierte
  • Lehrveranstaltungen
  • Student Guide
  • Modulhandbuch
  • Studienbüro Linguistik (SBL)
  • Prüfungsanmeldung
  • Prüfungsordnungen
  • Fachschaftsrat Linguistik
  • Sitemap
  • Datenschutz
  • Impressum

Context-preserving text simplification

Christina Niklaus (Universität St. Gallen), 13.10.2020, 16:00

Sentences that present a complex linguistic structure cannot only be hard to comprehend by human readers, but also difficult to analyze by semantic applications whose predictive quality deteriorates with sentence length and complexity. To facilitate and improve the performance of such tasks, we present a context-preserving Text Simplification approach that recursively splits and rephrases complex English sentences into a semantic hierarchy of simplified sentences using a small set of 35 hand-crafted transformation rules.

In a first step, a complex source sentence is broken down into a set of minimal propositions, i.e. a sequence of sound, self-contained utterances, with each of them presenting a single event that cannot be further decomposed into meaningful propositions. In that way, we generate a fine-grained intermediate representation that presents a simple and regular structure which is easier to process for downstream applications.

However, any sound and coherent text is not simply a loose arrangement of self-contained units, but rather a logical structure of utterances that are semantically connected. Consequently, when carrying out syntactic Text Simplification operations without considering discourse implications, the rewriting may easily result in a disconnected sequence of simplified sentences, making the text harder to interpret since important contextual information is lost. To preserve the coherence structure and, hence, the interpretability of the output, we establish a contextual hierarchy between the split components and identify the semantic relationship that holds between them. In that way, input sentences are converted into a two-layered hierarchical representation in the form of core sentences and accompanying contexts that are semantically connected via rhetorical relations.