Master thesis : Feature selection with deep neural networks
Promotor(s) : Geurts, Pierre
Date of defense : 26-Jun-2017/27-Jun-2017 • Permalink :
|Title :||Master thesis : Feature selection with deep neural networks|
|Author :||Vecoven, Nicolas|
|Date of defense :||26-Jun-2017/27-Jun-2017|
|Advisor(s) :||Geurts, Pierre|
|Committee's member(s) :||Wehenkel, Louis
|Number of pages :||76|
|Keywords :||[en] Machine learning|
[en] Deep learning
[en] Feature selection
[en] Artificial neural networks
|Discipline(s) :||Engineering, computing & technology > Computer science|
|Institution(s) :||Université de Liège, Liège, Belgique|
|Degree:||Master en ingénieur civil en informatique, à finalité spécialisée en "intelligent systems"|
|Faculty:||Master thesis of the Faculté des Sciences appliquées|
[en] Variable and feature selection have become the focus of much research, especially in bioinformatics where there are many applications. Machine learning is a powerful tool to select features, however not all machine learning algorithms are on an equal footing when it comes to feature selection. Indeed, many methods have been proposed to carry out feature selection with random forests, which makes them the current go-to model in bioinformatics.
On the other hand, thanks to the so-called deep learning, neural networks have benefited a huge interest resurgence in the past few years. However neural networks are blackbox models and very few attempts have been made in order to analyse the underlying process. Indeed, quite a few articles can be found about feature extraction with neural networks (for which the underlying inputs-outputs process does not need to be understood), while very few tackle feature selection.
In this document, we propose new algorithms in order to carry out feature selection with deep neural networks. To assess our results, we generate regression and classification problems which allow us to compare each algorithm on multiple fronts: performances, computation time and constraints. The results obtained are really promising since we manage to achieve our goal by surpassing (or equaling) random forests performances in every case (which was set to be our “state-of-the-art” comparison).
Due to the promising results obtained on artificial datasets we also tackle the DREAM4 challenge. Due to the very small number of samples available in the datasets, this challenge is supposedly an ill-suited problem for neural networks. We were nevertheless able to achieve near state of the art results.
Finally, extensions are given for most of our methods. Indeed, the algorithms discussed are very modulable and can be adapted regarding the problem faced. For example, we explain how one of our algorithm can be adapted in order to prune neural networks without losing accuracy.
Cite this master thesis
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.