Javascript must be enabled for the correct page display

Tracking provenance of change in data science pipelines

Apperloo, Edser (2020) Tracking provenance of change in data science pipelines. Master's Thesis / Essay, Computing Science.

[img]
Preview
Text
mCS_2020_ApperlooE.pdf

Download (3MB) | Preview
[img] Text
toestemming.pdf
Restricted to Registered users only

Download (97kB)

Abstract

A key element of academic research is the ability to reproduce results of the experiment. In order to ensure the reproducibility it is critical to know the exact versions of all elements involved. Once this information is known, differences between editions of experiments can be traced and reasons for deviating results may be concluded. It is therefore not only important to track the versions of both data and computations, but also track how these evolve throughout time. This research investigates the different elements of reproducibility and tracking the provenance of change in data science pipelines. Existing tools for these processes are analyzed and evaluated; subsequently, a novel conceptual architecture for a framework is introduced. This framework aims at assisting users in tracking the provenance of change of their pipelines such that they enable reproducibility of their experiment. The framework is able to track the evolution of code, configurations, and data throughout the pipeline. It is accompanied by a proof-of-concept implementation called Iterum, which is used to evaluate the designed framework by implementing two use cases from the domain. These experiments show that the framework is capable of achieving its goals. Yet, it requires users to properly use the provided abstractions and, due to its alpha stage, is not ready to be evaluated on its usability and accessibility. This thesis focuses on pipelines and code and is the result of a joint research.

Item Type: Thesis (Master's Thesis / Essay)
Supervisor name: Karastoyanova, D. and Andrikopoulos, V.
Degree programme: Computing Science
Thesis type: Master's Thesis / Essay
Language: English
Date Deposited: 28 Aug 2020 12:49
Last Modified: 28 Aug 2020 12:49
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/23254

Actions (login required)

View Item View Item