Ruhr-Universität Bochum
Startseite
UeberblickÜberblick
Uni von A-ZA-Z
SucheSuche
KontaktKontakt

 


Home
CV
Research
Publications

Teaching

Misc

Vortragsreihe

Working Papers BLA

Special Issue on Beyond Semantics (CfP)

Sprachwissenschaftliches Institut

 
Stefanie Dipper
Sprachwissenschaftliches Institut » Personen » Stefanie Dipper » OTTO
 
    Transcription Tool "OTTO"

The transcription tool OTTO is being developed in the context of the DFG-funded project "Reference Corpus Middle High German (1050--1350)".

The tool supports easy and fast typing and instant rendering of transcription in order to gaining a look as close to the original manuscript as possible. In addition, the tool provides support for the management of transcription projects which involve distributed, collaborative working of multiple parties on collections of documents.

Developers: Lara Kresse, Martin Schnurrenberger, Seong Cho
Supervisor: Stefanie Dipper

   Historical Languages
Historical texts exhibit a large amount of character peculiarities (special letters, punctuation marks, abbreviations, etc.), which are not easily encoded by, e.g., the ASCII encoding standard. For instance, medieval German texts often use superscribed letters to represent remnant or emerging forms of diphthongs, e.g. uͦ. Moreover, the texts sometimes distinguish two forms of the (modern) letter <s>, the so-called short vs. long s: <s> vs. <ſ>. Conversely, some texts do not differentiate between the (modern) letters <u >and <v>.

Diplomatic transcription aims at reproducing a large range of features of the original manuscript or print, such as large initials or variant letter forms.

   OTTO: "Online Transcription Tool"
OTTO is an online transcription tool for editing, viewing and storing information of historical language data. The tool is written in PHP and also uses some Java Script; data is stored in a mySQL database. Any server which runs PHP>5.2 can be a host for OTTO. Users can login to the tool from anywhere using a standard web browser.

OTTO's functionalities are grouped by different menues. The Documents menu area hosts all tasks that concern the transcription itself. New starts a new transcription, Open loads a document from the OTTO database. The View option shows the transcription in the original layout, displaying the transcription in form of pages, page sides and columns. This format can be used to print out a paper version.

The Import function is designed for editing transcription files which have not been created within OTTO but have to be transformed from non-diplomatic into diplomatic transcriptions. This is usually the case when transcriptions start out with digitized (non-diplomatic) editions. Currently, import is provided for plain text or docx formats. The tool provides export to XML and plain text format.

The Edit menu is the main area for working on different aspects of the current open document. The menu integrates a user-definable header editor, to enter meta information about the manuscript, such as its title, author, date of origin, etc.

The screenshot below displays the Edit menu, with a sample header on top, followed by the transcription rules (not expanded in the screenshot), and the text editor at the bottom. Menu items can be expanded and collapsed by clicking on the respective labels.

The tool's core feature is the Text Editor. The upper part of the Text Editor in the screenshot displays the lines that have been transcribed and saved already. Each line is preceded by the bibliographic key, KöMo, the folio and line numbers, which are automatically generated. (Click on the image to enlarge it.)
The lower part ist dominated by two separate frames. The frame on the left, called Transcription, is the currently "active" field, where the user enters the transcription (or edits an existing one). The transcriber can use substitute characters to encode non-ASCII characters. In the figure, the dollar sign ($) serves as a substitute for long s (<ſ>), and "%." stands for the middle dot. The frame on the right, called Unicode, directly transforms the user input to its diplomatic transcription form, using a set of transcription rules. The diplomatic Unicode view thus provides immediate feedback to the transcriber whether the input is correct or not. (The text starts with the character sequence "*{I*}", which encode initial <I>, which can range over the height of two or more lines. No Unicode equivalent has been defined yet.)

Transcription rules have the form of "search-and-replace" patterns, cf. the screenshot below. The first column specifies the character "to be searched" (e.g. $), the second column specifies the diplomatic Unicode character that "replaces" the actual character (e.g. ſ). Transcription rules are defined by the user. They can be defined locally---i.e., applying to the current transcription only---or globally, i.e., applying to all documents contained in OTTO's database. The rules are used to map the lines entered in the Transcription frame to the lines in diplomatic form in the Unicode frame.

   Publications

  • Stefanie Dipper and Martin Schnurrenberger (2011) OTTO: A Tool for Diplomatic Transcription of Historical Texts In Zygmunt Vetulani (ed.): Human Language Technology: Challenges for Computer Science and Linguistics. 4th Language and Technology Conference, LTC 2009. Revised Selected Papers, pp. 456-467. Springer. URL (Revised version of Dipper and Schnurrenberger (2009)).
  • Stefanie Dipper and Martin Schnurrenberger (2009) OTTO: A Tool for Diplomatic Transcription of Historical Texts In Proceedings of the 4th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 516-520. Poznan, Poland. PDF

 
 
Zum Seitenanfang  Seitenanfang
Letzte Änderung: Thursday, 04-Aug-2011 11:59:38 CEST | Erstellt von: Stefanie Dipper
zur Navigation zum Inhalt