This website presents information about reference corpora for Middle High German and Early New High German.

In the early 2000s, a range of German historical linguists started an initiative with the goal of creating a diachronic reference corpus of German. To aim for this goal, several related projects applied successfully for funding at the Deutsche Forschungsgemeinschaft (DFG):

To allow for diachronic investigations, all projects closely collaborate in developing common annotation standards. The entire corpus will eventually be available and searchable via the search tool ANNIS.

  • Selection of texts
    Both texts included in REM and texts included in REF are balanced for time period, dialect region, genre, and verse vs. prose.

  • Transcriptions
    The transcriptions in REM and REF are diplomatic, i.e. they stay as close to the original manuscripts as possible.

  • Annotations
    Annotations in REM and REF cover the levels lemma, morphology and part of speech. REM distinguishes between POS annotations that relate to the wordforms as such (lemma-related), and annotations that relate to the wordform in its current use (instance-related). We also developed a tagset, HiTS, specifically suited for historical German. More…

  • Tools for the annotation
    For the annotation process of REF, the web-based annotation tool CorA was developed. It allows for annotating multiple annotation levels, editing the primary data and modifying token boundaries during the annotation process.

  • Searching the corpus / Availability
    The REA subcorpus is already available via the search tool ANNIS, which allows searching for metadata, wordforms, annotations, and any combinations of these. The REM subcorpus will be made available in the near future; we are currently working on a TEI export of the REM data. The REF corpus is still in the annotation process.