Javascript must be enabled for the correct page display

Indexing Compressed Text

Lueks, W. (2008) Indexing Compressed Text. Bachelor's Thesis, Mathematics.

[img]
Preview
Text
Wouter_Lueks_WB_2008.pdf - Published Version

Download (470kB) | Preview

Abstract

We study a method by Ferragina and Manzini for creating an index of a text. This index allows us to find any string in the original text. What is so special about this index is that it is smaller than the original text, while still allowing quick searching and recovery of the original text. In order to understand the performance bounds given by Ferragina and Manzini we first examine the concept of information density, the entropy. Next we examine the details of the method suggested by Ferragina and Manzini. Finally we design an extention to their method. Using this method we are not only able to search for any specific string in the text, but also for some more generalized descriptions of pieces of text. More precisely we can find all matches for a given regular expression. Using this we are able to find answers to the question like ‘give all quoted piece of text’.

Item Type: Thesis (Bachelor's Thesis)
Supervisor name: Aiello, M. and Hesselink, W,H,
Degree programme: Mathematics
Thesis type: Bachelor's Thesis
Language: English
Date Deposited: 15 Feb 2018 07:28
Last Modified: 17 Apr 2019 12:29
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/8491

Actions (login required)

View Item View Item