Stefanie Dipper - Research | Department of Linguistics, RUB

Research

Research Topics

Computational historical linguistics (Comphist)

We work on developing methods and tools for analyzing historical language data. More information on the resources and results from several research projects that deal with data from Middle High German (1050–1350 CE) and Early New High German (1350–1650 CE) is provided on our site on Computational Historical Linguistics.

Since January 2022, I have been the deputy spokesperson of the Collaborative Research Center SFB 1475, where I co-lead project B01 with Frederik Elwert on automatic metaphor analysis in Middle High German texts: Jesus and Mary as divine healers in service for the salvation of the faithful: A mixed-method analysis of medical metaphorizations in Medieval German texts. I am also co-PI in the SFB’s INF project: Metaphor Base Camp: Providing the common data basis and advancing digital research methods for religious metaphors. One of the goals of the INF project is to develop a human-in-the-loop approach to semi-automatic annotation of metaphors.

I’m also part of the SFB 1102 in Saarbrücken. Together with Augustin Speyer, I’m PI of project C6: Information Management as a Factor for Syntactic Variation in the History of German. We investigate whether information-related factors have an impact on syntactic variation, and if so, how the impact can be modelled. We create synchronic and diachronic corpora, and analyse them qualitatively and quantitatively.

Generating Corpora

In the context of different projects, we created several corpora, which are all freely available, see the respective websites.

Anselm corpus: a parallel corpus of text variants from Early New High German
ReM and ReF: Reference corpora of Middle and Early New High German
Litkey corpus: a longitudinal corpus of picture story descriptions produced by German primary school children
NoSta-D corpus: a (small) corpus of German non-standard varities

Funded Projects

B01: Jesus and Mary as Divine Healers in Service for the Salvation of the Faithful: A Mixed-Method Analysis of Medical Metaphorizations in Medieval German Texts
- Subproject of the CRC 1475 “Metaphors of Religion: Religious Meaning-Making in Language Use”; cooperation with Frederik Elwert and Simone Schultz-Balluff/Halle, 2022-2025
- In this project, we investigate medical metaphors in Christian texts from the Middle Ages. The concept of “salvation”, (Seelenheil in German), is one of the fundamental elements of the Christian faith. The notion of a soul that requires healing implies that it has been sick or wounded; therefore, medical vocabulary is used to depict the spiritual dimension of this. Combining hermeneutic and computational methods, we will, first, manually annotate, analyze and interpret medical metaphors and, second, apply quantitative methods to identify metaphorically used words, infer their meaning(s), and support hermeneutic interpretation.
INF: Metaphor Base Camp: Providing the Common Data Basis and Advancing Digital Research Methods for Religious Metaphors
- Subproject of the CRC 1475 “Metaphors of Religion: Religious Meaning-Making in Language Use”; cooperation with Frederik Elwert, Volkhard Krech and Danah Tonne/KIT, 2022-2025
- Within the INF project, scholars of religion, computational linguists, and computer scientists jointly establish the digital research infrastructure of the CRC. The INF project is responsible for three methodological “layers”: (1) The shared Repository, Thesaurus, and Annotation services; (2) Tools for corpus analysis and interactive visualization; (3) Inclusion of advanced computational methods as developed by other sub-projects. This infrastructure fosters a common methodological basis for all projects and enables comparative research across languages and religious traditions.
C6: Information Management as a Factor for Syntactic Variation in the History of German
- Subproject of the CRC 1102 “Information Density and Linguistic Encoding” (IDeaL); cooperation with Augustin Speyer/Saarbrücken, 2018–2026
- In this project, we investigate whether information-related factors have an impact on syntactic variation, and if so, how the impact can be modelled. In the first project phase (2018–2022), we examined extraposition of nominal and prepositional phrases and relative clauses. In the second project phase (2022–2026), we focus on the serialization of constituents relative to each other within the German middle field.

Past Funded Projects

Litkey: Literacy as the key to social participation: Psycholinguistic perspectives on orthography instruction and literacy acquisition
- Litkey comprised four different sub-projects, see the Litkey website. In my project, we built the Litkey corpus, a longitudinal corpus of picture story descriptions produced by German primary school children from grades 2 to 4. We use the corpus to investigate the relationship between spelling errors of beginning writers and the orthographic properties of words. The corpus is freely available.
- VolkswagenStiftung, Förderinitiative Schlüsselthemen für Wissenschaft und Gesellschaft, 2015–2019; Kooperationsprojekt mit Eva Belke/Bochum, Sonia Kandel/Grenoble, Claudia Müller/Bochum
St. Anselmi Fragen an Maria — digitale Erschließung, Auswertung und Edition der gesamten deutschsprachigen Überlieferung (14.-16. Jh.)
- This project delt with a popular medieval dialogue between Anselm of Canterbury and Saint Mary. The text has been preserved in various German dialects in more than 60 manuscripts and prints from the 14th–16th centuries, all of which we have transcribed and annotated. We used the corpus for investigating diatopic and diachronic variation.
- DFG-Projekt, 2011–2017
Referenzkorpus Frühneuhochdeutsch (ReF, 1350–1650)
Referenzkorpus Mittelhochdeutsch (ReM, 1050–1350)
- Both projects aimed at building reference corpora of historical German: Middle High German (1050–1350) and Early New High German (1350–1650). Texts have been manually digitized and semi-automatically annotated with parts of speech and morphology as well as lemma information.
- ReF: DFG-Projekt, 2011–2018, Kooperationsprojekt mit Hans-Joachim Solms/Halle, Ulrike Demske/Saarbrücken und Klaus-Peter Wegera/Bochum
- ReM: DFG-Projekt, 2009–2015, Kooperationsprojekt mit Klaus-Peter Wegera/Bochum, Thomas Klein/Bonn, Claudia Wich-Reif/Bonn
Linguistische Annotation von Nichtstandardvarietäten — Guidelines und “Best Practices”
- In this project, we annotated a range of data from non-standard varieties and developed annotation guidelines and best-practices. Results, including the annotated corpus, can be found hier.
- CLARIN-D-Kurationsprojekt, 2012–2013; Kooperationsprojekt mit Anke Lüdeling/HU Berlin

Further Projects

Annotation and analysis of abstract anaphora
- Cooperation with Heike Zinsmeister and Varada Kolhatkar
- In this project, we investigate the use of abstract anaphora in German (and English). Abstract anaphors (e.g. this, that) are used to refer to abstract objects such as events or facts: Each fall, penguins migrate to Fiji. That’s why I’m going there next month (example from Byron 2002). In this example, an event (the penguins’ migration) is the abstract antecedent.