Automatic Text Recognition for Historical Documents

Make historical manuscripts and prints from any era and in any language and script machine-readable with the web application Transkribus

Time and place: Jan. 9, 2024 9:00 AM – 4:00 PM, GSH: DSC-Oasen

A poster displaying the name of the workshop beside a picture of a young man using a computer.

In this hands-on workshop, learners will get an introduction to the application Transkribus. They will gain insight into the main features of the application: how to create collections of digital images of historical documents, how to add metadata, how to navigate the workspace, how to share collections and manage users. They will acquire a basic understanding of machine learning for handwritten text recognition (HTR) and how to use public HTR models for their work. Building upon this, learners can train their own model for layout and text recognition.In the morning, we will focus on getting used to the Transkribus web application and how to create and manage your own collection of digital images. We will look into how HTR works and what can - and cannot - be achieved through automatic text recognition with Transkribus.In the afternoon, we will have time for working on our own materials. You will do an automatic layout detection on a few pages of digital images and practice manual correction of layout features. We will then walk you through the text recognition process, and you can continue working with your materials. Since the text recognition might take some time, there will be example materials provided that participants can work on.We will round of the workshop with a discussion of participants' projects and plan a follow-up session for those who are interested.

Learning outcomes

foundational knowledge of the functionality of the web application
basic understanding of how handwritten text recognition (HTR) works
practiced creating collections and user management
practiced layout detection and manual layout adjustment
practiced text recognition using a public HTR model
practiced manual correction of automatic text recognition
understanding how to prepare for and execute an HTR model training

Prerequisites

The workshop has no prerequisites.

Target audience

Researchers and research support staff who work with historical handwritten documents or historical prints.

Required Materials

A laptop and a modern browser like Google Chrome, Firefox, or MS Edge is necessary. Learners may bring digital images of texts, like manuscripts, historical prints, etc., to work on.

Organizer

Annika Rockenberger

Published Oct. 24, 2023 5:42 PM - Last modified Feb. 2, 2024 12:42 PM