Feedback

Faculté des Sciences appliquées
Faculté des Sciences appliquées
Mémoire
VIEW 298 | DOWNLOAD 113

Master thesis : The Lottery Ticket Hypothesis and value-based Deep Reinforcement Learning

Télécharger
Debes, Baptiste ULiège
Promoteur(s) : Louppe, Gilles ULiège
Date de soutenance : 28-jan-2022 • URL permanente : http://hdl.handle.net/2268.2/13872
Détails
Titre : Master thesis : The Lottery Ticket Hypothesis and value-based Deep Reinforcement Learning
Auteur : Debes, Baptiste ULiège
Date de soutenance  : 28-jan-2022
Promoteur(s) : Louppe, Gilles ULiège
Membre(s) du jury : Geurts, Pierre ULiège
Fontaine, Pascal ULiège
Langue : Français
Mots-clés : [en] Lottery tickey hypothesis
[en] Model pruning
[en] Deep reinforcement learning
[en] DDQN
[en] SAC
[en] Soft-Actor-Critic
Discipline(s) : Ingénierie, informatique & technologie > Sciences informatiques
Public cible : Chercheurs
Professionnels du domaine
Etudiants
Grand public
Institution(s) : Université de Liège, Liège, Belgique
Diplôme : Master : ingénieur civil en science des données, à finalité spécialisée
Faculté : Mémoires de la Faculté des Sciences appliquées

Résumé

[en] The Lottery Ticket Hypothesis (LTH) suggests that randomly initialized overparametrized neural networks contain subnetworks which - when trained in isolation - are able to perform better than similar subnetworks whose architecture and weights are drawn randomly. Subnetworks matching the Lottery Ticket Hypothesis are referred to as winning tickets because they are the winners of the initialization lottery. An algorithm called Iterative Magnitude Pruning (IMP) was introduced to discover winning tickets. Finding well-performing sparse neural networks is especially interesting because of the potential large reduction in memory footprint and global computational burden. These combined may lead to an important reduction of the energy required to perform a same task. Deep Reinforcement Learning (DRL) has introduced algorithm capable of solving complex tasks (dynamic system control, Atari games, board games, ...). In this work we study the combination of deep reinforcement learning and the lottery ticket hypothesis. We focus on two algorithms namely Double Deep Q-Networks (DDQN) and Soft-Actor-Critic (SAC) which both belong to the fruitful class of value-based methods. We provide the third independent confirmation - in the context of deep reinforcement learning - of the existence of subnetworks matching the Lottery Ticket Hypothesis using Iterative Magnitude Pruning. Our experiments were carried on standard classic control as well as pixel-based environments. We provide experiments and guidelines regarding some important hyperparameters. We suggest a potential ability of winning tickets to robustly preserve low rank embeddings of the environment's state space. Some of ours results suggest that tickets found using IMP seem closer than expected to subnetworks that could be found using so-called structured pruning methods. Our experiments also showcase the ability of winning tickets to render inactive useless input variables while keeping good performance on the task. This result along with others indicate a potential ability of winning tickets to be used as feature importance extractors. Finally, a variant of Iterative Magnitude Pruning is introduced which we call pooled pruning. We suggest this variant could be beneficial for multi-networks algorithms such as Soft-Actor-Critic.


Fichier(s)

Document(s)

File
Access Master_thesis.pdf
Description:
Taille: 83.42 MB
Format: Adobe PDF
File
Access abstract.pdf
Description:
Taille: 242.27 kB
Format: Adobe PDF

Auteur

  • Debes, Baptiste ULiège Université de Liège > Master ingé. civ. sc. don. à . fin.

Promoteur(s)

Membre(s) du jury

  • Geurts, Pierre ULiège Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
    ORBi Voir ses publications sur ORBi
  • Fontaine, Pascal ULiège Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes informatiques distribués
    ORBi Voir ses publications sur ORBi
  • Nombre total de vues 298
  • Nombre total de téléchargements 113










Tous les documents disponibles sur MatheO sont protégés par le droit d'auteur et soumis aux règles habituelles de bon usage.
L'Université de Liège ne garantit pas la qualité scientifique de ces travaux d'étudiants ni l'exactitude de l'ensemble des informations qu'ils contiennent.