Gene regulatory network inference from observational and interventional expression data
Smetz, Colin
Promoteur(s) : Geurts, Pierre
Date de soutenance : 26-jui-2017/27-jui-2017 • URL permanente : http://hdl.handle.net/2268.2/2590
Détails
Titre : | Gene regulatory network inference from observational and interventional expression data |
Auteur : | Smetz, Colin |
Date de soutenance : | 26-jui-2017/27-jui-2017 |
Promoteur(s) : | Geurts, Pierre |
Membre(s) du jury : | Wehenkel, Louis
Huynh-Thu, Vân Anh Meyer, Patrick |
Langue : | Anglais |
Nombre de pages : | 84 |
Mots-clés : | [en] gene regulatory network [en] machine learning [en] random forest [en] enriched random forest [en] knockout [en] GRN inference [en] Z-score |
Discipline(s) : | Ingénierie, informatique & technologie > Sciences informatiques |
Public cible : | Chercheurs Etudiants |
Institution(s) : | Université de Liège, Liège, Belgique |
Diplôme : | Master en ingénieur civil en informatique, à finalité spécialisée en "intelligent systems" |
Faculté : | Mémoires de la Faculté des Sciences appliquées |
Résumé
[en] The problem of reverse-engineering biological networks has attracted a lot of attention in the last decades. Studying the interactions occurring inside a living organism is of great importance to understand the behavior of biological systems. The development of computer science and the abundance of new genetic data raised the question of predicting gene regulatory networks. These networks describe how some genes regulate the expression of some other genes.
Many methods have already been developed to infer these networks from gene expression data. Among them, GENIE3, a method based on Random Forests, was proposed and achieved state-of-the-art performance. However, one drawback of GENIE3 is its inability to use the specificities of some types of gene expression measurements, potentially missing useful information. In particular, datasets often include knockouts, which are measurements done after the deletion of a gene.
This thesis proposes new variants for GENIE3, based on the idea of enriched random forests, in order to integrate knockout specific information as weights guiding GENIE3 to a better prediction. First, the methods are tested on ideal cases where a knockout of every gene is available. Better predictions are indeed achieved and several ways of achieving the best results are highlighted. Realistic cases are then tested. Less convincing results are then obtained, although interesting phenomena are discovered.
The second part of the thesis studies the possibility of predicting the effect of knockouts. Differences and similarities with the GRN prediction problem are analyzed and a method of evaluation, although imperfect, is proposed. Several methods are then evaluated, showing relatively encouraging results. Some initiated reflections call for future developments.
The possibility of using the proposed weighted GENIE3 methods in other situations is also briefly explained. Important improvements are indeed achieved on several datasets without the use of knockouts.
Citer ce mémoire
L'Université de Liège ne garantit pas la qualité scientifique de ces travaux d'étudiants ni l'exactitude de l'ensemble des informations qu'ils contiennent.