Faculté des Sciences appliquées
Faculté des Sciences appliquées

Master's Thesis : Deep Reinforcement Learning with Applications to the Renewable Energy Transition

Bolland, Adrien ULiège
Promotor(s) : Ernst, Damien ULiège ; Wehenkel, Louis ULiège
Date of defense : 25-Jun-2020/26-Jun-2020 • Permalink :
Title : Master's Thesis : Deep Reinforcement Learning with Applications to the Renewable Energy Transition
Author : Bolland, Adrien ULiège
Date of defense  : 25-Jun-2020/26-Jun-2020
Advisor(s) : Ernst, Damien ULiège
Wehenkel, Louis ULiège
Committee's member(s) : Boukas, Ioannis ULiège
Vecoven, Nicolas ULiège
Wehenkel, Louis ULiège
Language : English
Discipline(s) : Engineering, computing & technology > Electrical & electronics engineering
Complementary URL :
Institution(s) : Université de Liège, Liège, Belgique
Degree: Master : ingénieur civil électricien, à finalité spécialisée en "signal processing and intelligent robotics"
Faculty: Master thesis of the Faculté des Sciences appliquées


[en] The major integration of variable energy resources is expected to shift a large proportion of energy exchanges closer to real-time, where more accurate forecasts are available. In this context, short-term electricity markets, and in particular the intraday market, are considered a suitable trading floor for these exchanges to occur. A key component for the successful integration of renewable energy sources is the use of energy storage. In the first part of this work, we propose a novel modelling framework for the strategic participation of energy storage in the European continuous intraday energy market where exchanges occur through a centralized order book. The goal of the storage device operator is the maximization of the profits received over the entire trading horizon, while taking into account the operational constraints of the unit. The sequential decision-making problem of trading in the intraday market is modelled as a Markov Decision Process. An asynchronous distributed version of the fitted Q iteration algorithm is chosen for solving this problem owing to its sample efficiency. The large and variable number of existing orders in the order book motivates the use of high level actions and an alternative state representation. Historical data are used for the generation of a large number of artificial trajectories in order to address exploration issues during the learning process. The resulting policy is back-tested and compared against a benchmark strategy that is the current industrial standard. Results indicate that the agent converges to a policy that achieves, on average, higher total revenues than the benchmark strategy.

In the second part of this work, we generalise the direct policy search algorithms to an algorithm we call Direct Environment Search with (projected stochastic) Gradient Ascent (DESGA). The latter can be used to jointly learn a Reinforcement Learning (RL) environment and a policy with maximal expected return over a joint hypothesis space of environments and policies. We illustrate the performance of DESGA on two benchmarks. First, we consider a parametric space of mass spring damper environments. Then, we use our algorithm for optimizing the size of the components and the operation of a small-scale and autonomous energy system, i.e. a solar off-grid microgrid, composed of photovoltaic panels, batteries, etc. The results highlight the excellent performances of the DESGA algorithm.



Access master_thesis.pdf
Size: 1.74 MB
Format: Adobe PDF
Access summary.pdf
Size: 74.14 kB
Format: Adobe PDF


  • Bolland, Adrien ULiège Université de Liège > Master ingé. civ. électr., à fin.


Committee's member(s)

  • Boukas, Ioannis ULiège Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart-Microgrids
    ORBi View his publications on ORBi
  • Vecoven, Nicolas ULiège Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
    ORBi View his publications on ORBi
  • Wehenkel, Louis ULiège Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Méthodes stochastiques
    ORBi View his publications on ORBi
  • Total number of views 146
  • Total number of downloads 743

All documents available on MatheO are protected by copyright and subject to the usual rules for fair use.
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.