Large-scale gene regulatory network inference from single-cell RNA seq data
Paquot, Sarah
Promotor(s) :
Geurts, Pierre
Date of defense : 25-Jun-2018/26-Jun-2018 • Permalink : http://hdl.handle.net/2268.2/4574
Details
Title : | Large-scale gene regulatory network inference from single-cell RNA seq data |
Author : | Paquot, Sarah ![]() |
Date of defense : | 25-Jun-2018/26-Jun-2018 |
Advisor(s) : | Geurts, Pierre ![]() |
Committee's member(s) : | Wehenkel, Louis ![]() Meyer, Patrick ![]() Huynh-Thu, Vân Anh ![]() |
Language : | English |
Number of pages : | 92 |
Keywords : | [fr] machine learning [fr] XGBoost [fr] GRN inference [fr] clustering [fr] single-cell |
Discipline(s) : | Engineering, computing & technology > Computer science |
Target public : | Researchers Student |
Institution(s) : | Université de Liège, Liège, Belgique |
Degree: | Master en ingénieur civil en informatique, à finalité spécialisée en "intelligent systems" |
Faculty: | Master thesis of the Faculté des Sciences appliquées |
Abstract
[en] Uncovering and modeling gene regulatory networks (GRNs) is one of the long-standing
challenges in systems biology. This uncovering implies to computationally predict, from
given gene expression data, direct regulatory interactions between transcription factors
and their target genes. All those predicted direct regulatory interactions form a GRN.
Several techniques have been tested to address this problem. Among those, GENIE3 is one
of the top performing methods. However, it has a big disadvantage, which is its slowness.
Using traditional sequencing methods, only the mean of the gene expression values over
a mix of millions of cells could be obtained. The emergence of new techniques allows the
creation of single-cell RNA-seq data, which contain values corresponding to the expression
level in every single cell. It raises two main challenges. First, a computational challenge,
as it creates much bigger expression matrices than traditional methods. Second, we can
now see different cell types in the data, which we were not able to see before, as we only
had means of expression values from different cells. One strategy is to cluster this data so
that each cluster corresponds to a cell type contained in the data.
Our contribution in this context is first to propose a variant of GENIE3 that uses boosting
in order to make it faster and applicable to single-cell datasets. The results obtained are
very promising, as this transforms GENIE3 from a very slow method to a very fast one,
while having the same - and sometimes better - performance. The boosting method has
however the drawback of depending on many parameters. Our second contribution is to
propose three regulatory network-based methods for cell clustering from single-cell data.
Results obtained were not as good as expected but call for more investigations in this way.
Better results could probably be obtained by further analyzing some parameters.
File(s)
Document(s)
Cite this master thesis
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.