Master thesis : The Lottery Ticket Hypothesis and value-based Deep Reinforcement Learning

Master thesis : The Lottery Ticket Hypothesis and value-based Deep Reinforcement Learning

Debes, Baptiste

Date of defense : 28-Jan-2022 • Permalink : `http://hdl.handle.net/2268.2/13872`

Details

Title :	Master thesis : The Lottery Ticket Hypothesis and value-based Deep Reinforcement Learning
Author :	Debes, Baptiste
Date of defense :	28-Jan-2022
Advisor(s) :	Louppe, Gilles
Committee's member(s) :	Geurts, Pierre Fontaine, Pascal
Language :	French
Keywords :	[en] Lottery tickey hypothesis [en] Model pruning [en] Deep reinforcement learning [en] DDQN [en] SAC [en] Soft-Actor-Critic
Discipline(s) :	Engineering, computing & technology > Computer science
Target public :	Researchers Professionals of domain Student General public
Institution(s) :	Université de Liège, Liège, Belgique
Degree:	Master : ingénieur civil en science des données, à finalité spécialisée
Faculty:	Master thesis of the Faculté des Sciences appliquées

Abstract

[en] The Lottery Ticket Hypothesis (LTH) suggests that randomly initialized overparametrized neural networks contain subnetworks which - when trained in isolation - are able to perform better than similar subnetworks whose architecture and weights are drawn randomly. Subnetworks matching the Lottery Ticket Hypothesis are referred to as winning tickets because they are the winners of the initialization lottery. An algorithm called Iterative Magnitude Pruning (IMP) was introduced to discover winning tickets. Finding well-performing sparse neural networks is especially interesting because of the potential large reduction in memory footprint and global computational burden. These combined may lead to an important reduction of the energy required to perform a same task. Deep Reinforcement Learning (DRL) has introduced algorithm capable of solving complex tasks (dynamic system control, Atari games, board games, ...). In this work we study the combination of deep reinforcement learning and the lottery ticket hypothesis. We focus on two algorithms namely Double Deep Q-Networks (DDQN) and Soft-Actor-Critic (SAC) which both belong to the fruitful class of value-based methods. We provide the third independent confirmation - in the context of deep reinforcement learning - of the existence of subnetworks matching the Lottery Ticket Hypothesis using Iterative Magnitude Pruning. Our experiments were carried on standard classic control as well as pixel-based environments. We provide experiments and guidelines regarding some important hyperparameters. We suggest a potential ability of winning tickets to robustly preserve low rank embeddings of the environment's state space. Some of ours results suggest that tickets found using IMP seem closer than expected to subnetworks that could be found using so-called structured pruning methods. Our experiments also showcase the ability of winning tickets to render inactive useless input variables while keeping good performance on the task. This result along with others indicate a potential ability of winning tickets to be used as feature importance extractors. Finally, a variant of Iterative Magnitude Pruning is introduced which we call pooled pruning. We suggest this variant could be beneficial for multi-networks algorithms such as Soft-Actor-Critic.

File(s)

Document(s)

Master_thesis.pdf
Description:
Size: 83.42 MB
Format: Adobe PDF

abstract.pdf
Description:
Size: 242.27 kB
Format: Adobe PDF

Cite this master thesis

All documents available on MatheO are protected by copyright and subject to the usual rules for fair use.
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.

Nom	Provider / Domaine	Expiration	Description
JSESSIONID	Oracle Corporation www.uliege.be	Session	Cookie de session de plate-forme à usage général, utilisé par les sites écrits en JSP. Habituellement utilisé pour maintenir une session utilisateur anonyme par le serveur.
CookieScriptConsent	CookieScript .uliege.be	1 an	Ce cookie est utilisé par le service Cookie-Script.com pour mémoriser les préférences de consentement des visiteurs en matière de cookies. Il est nécessaire pour que la bannière de cookies Cookie-Script.com fonctionne correctement.

Nom	Provider / Domaine	Expiration	Description
_pk_id	InnoCraft Ltd .uliege.be	1 an	Ce nom de cookie est associé à la plateforme d'analyse Web open source Matomo. Il est utilisé pour aider les propriétaires de sites Web à suivre le comportement des visiteurs et à mesurer les performances du site. Il s'agit d'un cookie de type modèle, où le préfixe _pk_id est suivi d'une courte série de chiffres et de lettres, qui est censé être un code de référence pour le domaine définissant le cookie.
_pk_ses	InnoCraft Ltd .uliege.be	30 minutes	Ce nom de cookie est associé à la plateforme d'analyse Web open source Matomo. Il est utilisé pour aider les propriétaires de sites Web à suivre le comportement des visiteurs et à mesurer les performances du site. Il s'agit d'un cookie de type modèle, où le préfixe _pk_ses est suivi d'une courte série de chiffres et de lettres, ce qui est considéré comme un code de référence pour le domaine définissant le cookie.
_pk_ref	InnoCraft Ltd .uliege.be	6 mois	Ce nom de cookie est associé à la plateforme d'analyse Web open source Matomo. Il est utilisé pour aider les propriétaires de sites Web à suivre le comportement des visiteurs et à mesurer les performances du site. Il s'agit d'un cookie de type modèle, où le préfixe _pk_ref est suivi d'une courte série de chiffres et de lettres, ce qui est considéré comme un code de référence pour le domaine définissant le cookie.

MASTER THESIS

Master thesis : The Lottery Ticket Hypothesis and value-based Deep Reinforcement Learning

Debes, Baptiste

Promotor(s) : Louppe, Gilles

Date of defense : 28-Jan-2022 • Permalink : http://hdl.handle.net/2268.2/13872

Details

Abstract

File(s)

Document(s)

Author

Promotor(s)

Committee's member(s)

Cite this master thesis

APA

Chicago

Date of defense : 28-Jan-2022 • Permalink : `http://hdl.handle.net/2268.2/13872`