Javascript must be enabled for the correct page display

Text Recognition in Printed Historical Documents

Laarhoven, T.M. van (2010) Text Recognition in Printed Historical Documents. Master's Thesis / Essay, Computing Science.

INF-MA-2010-T.M.van.Laarhoven.pdf - Published Version

Download (11MB) | Preview


In this thesis we work on recognizing the text in the book ``Rerum Frisicarum Historia'' by Ubbo Emmius. The current situation is that books like this are digitized by hand, a labor intensive and expensive process. The focus of this thesis is on the algorithms that we have developed for various steps of the character recognition process. Besides these algorithms we have also developed an OCR system that is useful to historians. Our OCR program is structured as a pipeline. We start with a photograph or scan of a book page, and output a plain text representation of that page. These steps are: 1. Converting the image to grayscale and processing it to improve contrast. 2. Selecting the area containing the body text; the rest of the image is discarded. 3. Splitting the body text into lines. 4. Splitting each line into `components', which form characters. 5. Recognizing the text on each line. All the steps in this pipeline are treated in turn.

Item Type: Thesis (Master's Thesis / Essay)
Degree programme: Computing Science
Thesis type: Master's Thesis / Essay
Language: English
Date Deposited: 15 Feb 2018 07:44
Last Modified: 15 Feb 2018 07:44

Actions (login required)

View Item View Item