Bistable Recurrent Cells and Belief Filtering for Q-learning in Partially Observable Markov Decision Processes

Bistable Recurrent Cells and Belief Filtering for Q-learning in Partially Observable Markov Decision Processes

Lambrechts, Gaspard

Date of defense : 24-Jun-2021/25-Jun-2021 • Permalink : `http://hdl.handle.net/2268.2/11474`

Details

Title :	Bistable Recurrent Cells and Belief Filtering for Q-learning in Partially Observable Markov Decision Processes
Translated title :	[fr] Cellules récurrentes bistables et filtrage de la distribution sur les états pour le Q-learning dans les processus de décisions markoviens partiellement observables
Author :	Lambrechts, Gaspard
Date of defense :	24-Jun-2021/25-Jun-2021
Advisor(s) :	Ernst, Damien
Committee's member(s) :	Louppe, Gilles Drion, Guillaume Bolland, Adrien
Language :	English
Number of pages :	74
Keywords :	[en] Reinforcement Learning [en] Belief Filtering [en] POMDP [en] Deep Recurrent Q-Network [en] DRQN [en] Online Fitted Q-Iteration [en] OFQI [en] RNN [en] Bistable Recurrent Cell [en] BRC [en] Q-Learning [en] Markov Decision Process [en] MDP [en] Bistability [en] Target Network [en] Partially Observable Markov Decision Process [en] Recurrent Neural Network [en] Mutual Information [en] RL
Discipline(s) :	Engineering, computing & technology > Computer science
Target public :	Researchers Professionals of domain Student
Institution(s) :	Université de Liège, Liège, Belgique
Degree:	Master : ingénieur civil en science des données, à finalité spécialisée
Faculty:	Master thesis of the Faculté des Sciences appliquées

Abstract

[en] In this master's thesis, reinforcement learning (RL) methods are used to learn (near-)optimal policies to act in several Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). More precisely, Q-learning and recurrent Q-learning techniques are used. Some of the considered POMDPs require a high-memorisation ability in order to achieve optimal decision making. In POMDPs, RL techniques usually rely on approximating functions that take as input sequences of observations with variable length. Recurrent neural networks (RNNs) are thus a clever choice of such approximators. This work is based on the recently introduced bistable recurrent cells, which have been empirically shown to provide a significantly better long term memory than standard cells, such as the long short-term memory (LSTM) and the gated recurrent unit (GRU). These cells are named the bistable recurrent cell (BRC) and the recurrently neuromodulated BRC (nBRC). First, by importing these cells for the first time in the RL setting, it is empirically shown that they also provide a significant advantage in memory-demanding POMDPs, in comparison to LSTM and GRU. Second, the ability of the RNN to represent a belief distribution over the states of the POMDP is studied. It is achieved by evaluating the mutual information between the hidden states of the RNN and the belief filtered on the successive observations. This analysis is thus strongly anchored in the theory of information and the theory of optimal control for POMDPs. Third, as a complement to this research project, a new target update is proposed for Q-learning algorithms with target networks, for both reactive and recurrent policies. This new update speeds up learning, especially in environments with sparse rewards.

File(s)

Document(s)

thesis.pdf
Description: Master's Thesis
Size: 3.97 MB
Format: Adobe PDF

summary.pdf
Description: Master's Thesis Summary
Size: 167.72 kB
Format: Adobe PDF

Cite this master thesis

All documents available on MatheO are protected by copyright and subject to the usual rules for fair use.
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.

Nom	Provider / Domaine	Expiration	Description
JSESSIONID	Oracle Corporation www.uliege.be	Session	Cookie de session de plate-forme à usage général, utilisé par les sites écrits en JSP. Habituellement utilisé pour maintenir une session utilisateur anonyme par le serveur.
CookieScriptConsent	CookieScript .uliege.be	1 an	Ce cookie est utilisé par le service Cookie-Script.com pour mémoriser les préférences de consentement des visiteurs en matière de cookies. Il est nécessaire pour que la bannière de cookies Cookie-Script.com fonctionne correctement.

Nom	Provider / Domaine	Expiration	Description
_pk_id	InnoCraft Ltd .uliege.be	1 an	Ce nom de cookie est associé à la plateforme d'analyse Web open source Matomo. Il est utilisé pour aider les propriétaires de sites Web à suivre le comportement des visiteurs et à mesurer les performances du site. Il s'agit d'un cookie de type modèle, où le préfixe _pk_id est suivi d'une courte série de chiffres et de lettres, qui est censé être un code de référence pour le domaine définissant le cookie.
_pk_ses	InnoCraft Ltd .uliege.be	30 minutes	Ce nom de cookie est associé à la plateforme d'analyse Web open source Matomo. Il est utilisé pour aider les propriétaires de sites Web à suivre le comportement des visiteurs et à mesurer les performances du site. Il s'agit d'un cookie de type modèle, où le préfixe _pk_ses est suivi d'une courte série de chiffres et de lettres, ce qui est considéré comme un code de référence pour le domaine définissant le cookie.
_pk_ref	InnoCraft Ltd .uliege.be	6 mois	Ce nom de cookie est associé à la plateforme d'analyse Web open source Matomo. Il est utilisé pour aider les propriétaires de sites Web à suivre le comportement des visiteurs et à mesurer les performances du site. Il s'agit d'un cookie de type modèle, où le préfixe _pk_ref est suivi d'une courte série de chiffres et de lettres, ce qui est considéré comme un code de référence pour le domaine définissant le cookie.

MASTER THESIS

Bistable Recurrent Cells and Belief Filtering for Q-learning in Partially Observable Markov Decision Processes

Lambrechts, Gaspard

Promotor(s) : Ernst, Damien

Date of defense : 24-Jun-2021/25-Jun-2021 • Permalink : http://hdl.handle.net/2268.2/11474

Details

Abstract

File(s)

Document(s)

Author

Promotor(s)

Committee's member(s)

Cite this master thesis

APA

Chicago

Date of defense : 24-Jun-2021/25-Jun-2021 • Permalink : `http://hdl.handle.net/2268.2/11474`