What This Workshop is About
This workshop introduces learners to the automatic text recognition (ATR) tool suite “Loghi”. This collection of tools can be used locally, i.e., without uploading data to a foreign server or a cloud service, which may not be an option for projects with stricter data protection guidelines.
The workshop will briefly introduce ATR and its capabilities and limitations. Afterwards, learners will be introduced to Loghi’s command-line interface, and we will practice using a previously trained model to recognise text, prepare our data for training, and utilise it to train a new model.
Instructions and Setup
Find instructions and detailed course content on the GitHub course website.
Learning Outcomes
- basic understanding of automatic text recognition (ATR)
- knowledge of what training data for ATR looks like
- knowledge of what tools are offered by the Loghi tool suite
- practised how to use a pre-trained model in Loghi to recognise texts
- practised how to prepare data for training ATR models in Loghi
- practised how to train a custom ATR model in Loghi
Prerequisites
Basic Unix Shell knowledge (navigating through directories and executing commands) and the possibility of installing Docker on your laptop.
Target Audience
Researchers, research support staff, research software engineers, and developers who work with handwritten documents or printed material and who would like to keep their data local when using automatic text recognition.
Required Material
A laptop with Docker installed. Setup instructions will be provided a few weeks ahead of the workshop. Demo data will be supplied, but learners may bring digital images of texts to experiment with.
Register
Coming soon!
Programme
We will take a fifteen-minute break halfway through the workshop.
Coffee/tea will be available from 08:30.
Victuals
We will serve hot and cold beverages & nibbles.
About Raphaela Heil
Raphaela Heil is a software engineer at the Popular Movements’ Archive Uppsala, working with automatic text recognition and document image processing. She recently defended her thesis “Document Image Processing for Handwritten Text Recognition: Deep Learning-based Transliteration of Astrid Lindgren’s Stenographic Manuscripts” (2023). Raphaela is a certified Carpentries and CodeRefinery instructor.
About BærUt!
BærUt! is a competence hub at the University of Oslo for promoting digital scholarly editions (DSEs). Our ambition is to consolidate expertise and knowledge in the field, gather researchers and practitioners, and, in the long run, create the foundation for a common platform for digital editions. We work closely with researchers, developers, and cultural institutions to digitise historical and cultural text documents and ensure these resources are accessible and valuable for academic use.
Questions?
Send an email to the project leader, Annika Rockenberger.