NB! Postponed to autumn semester: Ditching Transkribus: Local Automatic Text Recognition with Loghi

A half-day hands-on workshop providing an introduction to automatic text recognition using the tool set Loghi.

Time and place: June 11, 2024, NB: postponed to Autumn semester

What This Workshop is About

This workshop introduces learners to the automatic text recognition (ATR) tool suite “Loghi”. This collection of tools can be used locally, i.e., without uploading data to a foreign server or a cloud service, which may not be an option for projects with stricter data protection guidelines.

The workshop will briefly introduce ATR and its capabilities and limitations. Afterwards, learners will be introduced to Loghi’s command-line interface, and we will practice using a previously trained model to recognise text, prepare our data for training, and utilise it to train a new model.

Instructions and Setup

Find instructions and detailed course content on the GitHub course website.

Learning Outcomes

basic understanding of automatic text recognition (ATR)
knowledge of what training data for ATR looks like
knowledge of what tools are offered by the Loghi tool suite
practised how to use a pre-trained model in Loghi to recognise texts
practised how to prepare data for training ATR models in Loghi
practised how to train a custom ATR model in Loghi

Prerequisites

Basic Unix Shell knowledge (navigating through directories and executing commands) and the possibility of installing Docker on your laptop.

Target Audience

Researchers, research support staff, research software engineers, and developers who work with handwritten documents or printed material and who would like to keep their data local when using automatic text recognition.

Required Material

A laptop with Docker installed. Setup instructions will be provided a few weeks ahead of the workshop. Demo data will be supplied, but learners may bring digital images of texts to experiment with.

Register

Coming soon!

Programme

We will take a fifteen-minute break halfway through the workshop.

Coffee/tea will be available from 08:30.

Victuals

We will serve hot and cold beverages & nibbles.

About Raphaela Heil

Raphaela Heil is a software engineer at the Popular Movements’ Archive Uppsala, working with automatic text recognition and document image processing. She recently defended her thesis “Document Image Processing for Handwritten Text Recognition: Deep Learning-based Transliteration of Astrid Lindgren’s Stenographic Manuscripts” (2023). Raphaela is a certified Carpentries and CodeRefinery instructor.

About BærUt!

BærUt! is a competence hub at the University of Oslo for promoting digital scholarly editions (DSEs). Our ambition is to consolidate expertise and knowledge in the field, gather researchers and practitioners, and, in the long run, create the foundation for a common platform for digital editions. We work closely with researchers, developers, and cultural institutions to digitise historical and cultural text documents and ensure these resources are accessible and valuable for academic use.

Questions?

Send an email to the project leader, Annika Rockenberger.

Published Feb. 29, 2024 11:16 AM - Last modified June 6, 2024 6:42 PM