Feedback

Faculté des Sciences appliquées
Faculté des Sciences appliquées
MASTER THESIS
VIEW 34 | DOWNLOAD 5

Master thesis : Toward functional and distributed R2RML processor

Download
Saillez, Brieuc ULiège
Promotor(s) : Debruyne, Christophe ULiège
Date of defense : 4-Sep-2023/5-Sep-2023 • Permalink : http://hdl.handle.net/2268.2/18377
Details
Title : Master thesis : Toward functional and distributed R2RML processor
Author : Saillez, Brieuc ULiège
Date of defense  : 4-Sep-2023/5-Sep-2023
Advisor(s) : Debruyne, Christophe ULiège
Committee's member(s) : Louveaux, Quentin ULiège
Fontaine, Pascal ULiège
Language : English
Number of pages : 58
Discipline(s) : Engineering, computing & technology > Computer science
Complementary URL : https://gitlab.uliege.be/Brieuc.Saillez/tfe
Institution(s) : Université de Liège, Liège, Belgique
Degree: Master en sciences informatiques, à finalité spécialisée en "intelligent systems"
Faculty: Master thesis of the Faculté des Sciences appliquées

Abstract

[en] Resource Description Framework (RDF) offers multiple advantages for data storage. Transforming data from relational databases into RDF datasets can be interesting. One prominent approach for generating RDF datasets from relational databases is the W3C relational database to RDF (R2RML) mapping language. Existing R2RML processors face challenges related to computing time and memory consumption, particularly when dealing with large-scale relational databases. This master's thesis presents a functional and distributed solution for implementing an R2RML processor working on cluster. A Scala solution based on Apache Spark that is purely functional is proposed. This approach involves an updated Java Parser from an existing implementation, a transformation of Java objects into Scala Abstract Data Type (ADT), a preprocessing to rewrite referencing object map into new triples map, and the generation and writing of the data. In this solution, the distribution of the task is based on relational data rows. For modestly-sized databases, this solution is slow due to an overhead introduced by Apache Spark. While being computed on cluster, the solution is fast for generation and will not consume too much memory. But, on too large-scale data, it suffers from memory problems that can be solved.


File(s)

Document(s)

File
Access TFE.pdf
Description:
Size: 708.78 kB
Format: Adobe PDF
File
Access TFE_Abstract.pdf
Description:
Size: 61.62 kB
Format: Adobe PDF

Author

  • Saillez, Brieuc ULiège Université de Liège > Master sc. informatiques, à fin.

Promotor(s)

Committee's member(s)

  • Louveaux, Quentin ULiège Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation : Optimisation discrète
    ORBi View his publications on ORBi
  • Fontaine, Pascal ULiège Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes informatiques distribués
    ORBi View his publications on ORBi
  • Total number of views 34
  • Total number of downloads 5










All documents available on MatheO are protected by copyright and subject to the usual rules for fair use.
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.