
Faculté des Sciences appliquées
Faculté des Sciences appliquées

Causal feature selection for predicting interventions

Van Buggenhout, François ULiège
Promotor(s) : Geurts, Pierre ULiège
Date of defense : 27-Jun-2016/28-Jun-2016 • Permalink :
Title : Causal feature selection for predicting interventions
Author : Van Buggenhout, François ULiège
Date of defense  : 27-Jun-2016/28-Jun-2016
Advisor(s) : Geurts, Pierre ULiège
Committee's member(s) : Louveaux, Quentin ULiège
Wehenkel, Louis ULiège
Huynh-Thu, Vân Anh ULiège
Language : English
Discipline(s) : Engineering, computing & technology > Electrical & electronics engineering
Institution(s) : Université de Liège, Liège, Belgique
Degree: Master en ingénieur civil électricien, à finalité approfondie
Faculty: Master thesis of the Faculté des Sciences appliquées


[en] With the advent of high throughput experiments, researchers are more and more often
confronted to really big datasets especially in biology. The number of variables can
reach several thousand and to deal with this increasing number of features, researchers
are using feature selection techniques to reduce the size of the datasets and so reduce
the costs linked to the experimentation.
In machine learning, the objective is to learn a model on a dataset in order to predict
the target variable of a second one drawn from the same distribution. Sometimes, the
distribution is not the same between the two databases. For example, in biology, an
experiment on the genes can be done on population from various places on earth. Since
the genomes between different population present some variations, it is important to
take them into account before selecting genes to predict the result of the experiment.
These variations can also come from external agents like drugs which manipulate some
In this thesis, we focused on data generated from causal graph because it is representative
of the phenomena met in biology like gene network. We were interested in a
particular situation where the target is provided for an unmanipulated dataset but the
objective is to predict the target of a manipulated database.
We developed a three stages process to find an optimal subset of features involving
causal feature selection, filtering and two algorithms developed in the framework of this
The first step was to find a small set of features made of both highly correlated manipulated
and unmanipulated variables. The two following steps focused on retrieving
the manipulations and removing the bad features.
The whole process was able to significantly increase the prediction efficiency compared
to classical feature selection techniques.



Access FVB_Causal_feature_selection.pdf
Size: 3.08 MB
Format: Adobe PDF


  • Van Buggenhout, François ULiège Université de Liège > Master ingé. civ. électr., fin. appr. (ex 2e master)


Committee's member(s)

  • Louveaux, Quentin ULiège Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation : Optimisation discrète
    ORBi View his publications on ORBi
  • Wehenkel, Louis ULiège Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
    ORBi View his publications on ORBi
  • Huynh-Thu, Vân Anh ULiège Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
    ORBi View his publications on ORBi
  • Total number of views 104
  • Total number of downloads 353

All documents available on MatheO are protected by copyright and subject to the usual rules for fair use.
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.