Old High German Text Generator

It is a sad fact that there is only a woefully low number of surviving Old High German texts. Every researcher would love to have access to more samples of this stage of the German language. Good news — now you can!

Enter your desired number of sentences below, and our OHG Text Generator will produce the data for you. You’re welcome!

How it works

This text generator is based on Markov chains, using an implementation by Robert Dawson. We are training a trigram model on all Old High German texts from the reference corpus Old High German.

If you’ve never encountered Markov chains before, the basic idea is this: take a training text (in our case, the OHG reference corpus), extract all n-grams of a given length – that is, all possible continuous sequences of n words (in our case: three) – and count them. When generating a text, these counts are used to infer probabilities for which words are likely to follow a given sequence of words. As a consequence, the generator has a very short “memory”: after three words it has already forgotten what it generated before!