Javascript must be enabled for the correct page display

Indexing compressed text

Lueks, W. (2008) Indexing compressed text. Bachelor's Thesis, Computing Science.

[img]
Preview
Text
INF-BA-2008-W.Lueks.pdf - Published Version

Download (470kB) | Preview

Abstract

We study a method by Ferragina and Manzini for creating an index of a text. This index allows us to find any string in the original text. What is so special about this index is that it is smaller than the original text, while still allowingquick searching and recovery of the original text. In order to understand the performance bounds given by Ferragina and Manzini we first examine the concept of information density, the entropy. Next we examine the details of the method suggested by Ferragina and Manzini. Finally we design an extension to their method. Using this method we are not only able to search for any specific string in the text, but also for some more generalized descriptions of pieces of text. More precisely we can find all matches for a given regular expression. Using this we are able to find answers to the question like ‘give all quoted piece of text’.

Item Type: Thesis (Bachelor's Thesis)
Degree programme: Computing Science
Thesis type: Bachelor's Thesis
Language: English
Date Deposited: 15 Feb 2018 07:28
Last Modified: 15 Feb 2018 07:28
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/8498

Actions (login required)

View Item View Item