Causal feature selection for predicting interventions
Van Buggenhout, François
Promotor(s) :
Geurts, Pierre
Date of defense : 27-Jun-2016/28-Jun-2016 • Permalink : http://hdl.handle.net/2268.2/1370
Details
Title : | Causal feature selection for predicting interventions |
Author : | Van Buggenhout, François ![]() |
Date of defense : | 27-Jun-2016/28-Jun-2016 |
Advisor(s) : | Geurts, Pierre ![]() |
Committee's member(s) : | Louveaux, Quentin ![]() Wehenkel, Louis ![]() Huynh-Thu, Vân Anh ![]() |
Language : | English |
Discipline(s) : | Engineering, computing & technology > Electrical & electronics engineering |
Institution(s) : | Université de Liège, Liège, Belgique |
Degree: | Master en ingénieur civil électricien, à finalité approfondie |
Faculty: | Master thesis of the Faculté des Sciences appliquées |
Abstract
[en] With the advent of high throughput experiments, researchers are more and more often
confronted to really big datasets especially in biology. The number of variables can
reach several thousand and to deal with this increasing number of features, researchers
are using feature selection techniques to reduce the size of the datasets and so reduce
the costs linked to the experimentation.
In machine learning, the objective is to learn a model on a dataset in order to predict
the target variable of a second one drawn from the same distribution. Sometimes, the
distribution is not the same between the two databases. For example, in biology, an
experiment on the genes can be done on population from various places on earth. Since
the genomes between different population present some variations, it is important to
take them into account before selecting genes to predict the result of the experiment.
These variations can also come from external agents like drugs which manipulate some
features.
In this thesis, we focused on data generated from causal graph because it is representative
of the phenomena met in biology like gene network. We were interested in a
particular situation where the target is provided for an unmanipulated dataset but the
objective is to predict the target of a manipulated database.
We developed a three stages process to find an optimal subset of features involving
causal feature selection, filtering and two algorithms developed in the framework of this
thesis.
The first step was to find a small set of features made of both highly correlated manipulated
and unmanipulated variables. The two following steps focused on retrieving
the manipulations and removing the bad features.
The whole process was able to significantly increase the prediction efficiency compared
to classical feature selection techniques.
Cite this master thesis
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.