Javascript must be enabled for the correct page display

Fuzzy Spatial Relations for Document Layout Analysis

Kuiper, E. and Wieringa, R. (1999) Fuzzy Spatial Relations for Document Layout Analysis. Master's Thesis / Essay, Computing Science.

[img]
Preview
Text
Infor_Ma_1999_EKuiper_RWieringa.CV.pdf - Published Version

Download (4MB) | Preview

Abstract

Document Understanding (DU) is the process of converting a document from its paper form to an electronic, editable form. One step of Document Understanding is the labeling of blocks on the page, this process is called Document Layout Analysis. Document pages are segmented, which results in blocks (rectangles) that contain parts of the document like headings, paragraphs and figures. The blocks are labeled with logical labels such as paragraph, heading, pagenumber etc. The research in this report focuses on this labeling. Much research is done in the area of Document Layout Analysis, most researchers use rule bases or document templates to label the blocks on a document. In this report the possibility of using fuzzy spatial relations with document layout analysis is examined. Technical papers (IEEE, Elsevier) are used as an input to the system and only the geometrical properties of the blocks are used and not the contents. Two experiments for recognizing the layout structure were performed: the first experiment uses block information (size, position) only, a second experiment is used to examine how the results of the first method can be improved using fuzzy spatial relations in combination with the iterative method. In the iterative method the rule base is processed more than once for a single document page. It can be concluded that it is possible to create a good layout analysis system that uses fuzzy logic and spatial relations, the recognition rate is up to 85% for 103 documents pages. Equations, paragraphs and headings present problems, in future research these problems might be solved by using extra information like the pixel density of the blocks and the font type setting (bold, italic etc).

Item Type: Thesis (Master's Thesis / Essay)
Degree programme: Computing Science
Thesis type: Master's Thesis / Essay
Language: English
Date Deposited: 15 Feb 2018 07:29
Last Modified: 15 Feb 2018 07:29
URI: http://fse.studenttheses.ub.rug.nl/id/eprint/8823

Actions (login required)

View Item View Item