Causal feature selection for predicting interventions

Causal feature selection for predicting interventions

Van Buggenhout, François

Date of defense : 27-Jun-2016/28-Jun-2016 • Permalink : `http://hdl.handle.net/2268.2/1370`

Details

Title :	Causal feature selection for predicting interventions
Author :	Van Buggenhout, François
Date of defense :	27-Jun-2016/28-Jun-2016
Advisor(s) :	Geurts, Pierre
Committee's member(s) :	Louveaux, Quentin Wehenkel, Louis Huynh-Thu, Vân Anh
Language :	English
Discipline(s) :	Engineering, computing & technology > Electrical & electronics engineering
Institution(s) :	Université de Liège, Liège, Belgique
Degree:	Master en ingénieur civil électricien, à finalité approfondie
Faculty:	Master thesis of the Faculté des Sciences appliquées

Abstract

[en] With the advent of high throughput experiments, researchers are more and more often
confronted to really big datasets especially in biology. The number of variables can
reach several thousand and to deal with this increasing number of features, researchers
are using feature selection techniques to reduce the size of the datasets and so reduce
the costs linked to the experimentation.
In machine learning, the objective is to learn a model on a dataset in order to predict
the target variable of a second one drawn from the same distribution. Sometimes, the
distribution is not the same between the two databases. For example, in biology, an
experiment on the genes can be done on population from various places on earth. Since
the genomes between different population present some variations, it is important to
take them into account before selecting genes to predict the result of the experiment.
These variations can also come from external agents like drugs which manipulate some
features.
In this thesis, we focused on data generated from causal graph because it is representative
of the phenomena met in biology like gene network. We were interested in a
particular situation where the target is provided for an unmanipulated dataset but the
objective is to predict the target of a manipulated database.
We developed a three stages process to find an optimal subset of features involving
causal feature selection, filtering and two algorithms developed in the framework of this
thesis.
The first step was to find a small set of features made of both highly correlated manipulated
and unmanipulated variables. The two following steps focused on retrieving
the manipulations and removing the bad features.
The whole process was able to significantly increase the prediction efficiency compared
to classical feature selection techniques.

File(s)

Document(s)

FVB_Causal_feature_selection.pdf
Description:
Size: 3.08 MB
Format: Adobe PDF

Cite this master thesis

All documents available on MatheO are protected by copyright and subject to the usual rules for fair use.
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.

MASTER THESIS

Causal feature selection for predicting interventions

Van Buggenhout, François

Promotor(s) : Geurts, Pierre

Date of defense : 27-Jun-2016/28-Jun-2016 • Permalink : http://hdl.handle.net/2268.2/1370

Details

Abstract

File(s)

Document(s)

Author

Promotor(s)

Committee's member(s)

Cite this master thesis

Date of defense : 27-Jun-2016/28-Jun-2016 • Permalink : `http://hdl.handle.net/2268.2/1370`