Master's Thesis : Deep Reinforcement Learning with Applications to the Renewable Energy Transition
Bolland, Adrien
Promotor(s) :
Ernst, Damien
;
Wehenkel, Louis
Date of defense : 25-Jun-2020/26-Jun-2020 • Permalink : http://hdl.handle.net/2268.2/8742
Details
Title : | Master's Thesis : Deep Reinforcement Learning with Applications to the Renewable Energy Transition |
Author : | Bolland, Adrien ![]() |
Date of defense : | 25-Jun-2020/26-Jun-2020 |
Advisor(s) : | Ernst, Damien ![]() Wehenkel, Louis ![]() |
Committee's member(s) : | Boukas, Ioannis ![]() Vecoven, Nicolas ![]() Wehenkel, Louis ![]() |
Language : | English |
Discipline(s) : | Engineering, computing & technology > Electrical & electronics engineering |
Complementary URL : | https://arxiv.org/abs/2004.05940 https://arxiv.org/abs/2006.01738 |
Institution(s) : | Université de Liège, Liège, Belgique |
Degree: | Master : ingénieur civil électricien, à finalité spécialisée en "signal processing and intelligent robotics" |
Faculty: | Master thesis of the Faculté des Sciences appliquées |
Abstract
[en] The major integration of variable energy resources is expected to shift a large proportion of energy exchanges closer to real-time, where more accurate forecasts are available. In this context, short-term electricity markets, and in particular the intraday market, are considered a suitable trading floor for these exchanges to occur. A key component for the successful integration of renewable energy sources is the use of energy storage. In the first part of this work, we propose a novel modelling framework for the strategic participation of energy storage in the European continuous intraday energy market where exchanges occur through a centralized order book. The goal of the storage device operator is the maximization of the profits received over the entire trading horizon, while taking into account the operational constraints of the unit. The sequential decision-making problem of trading in the intraday market is modelled as a Markov Decision Process. An asynchronous distributed version of the fitted Q iteration algorithm is chosen for solving this problem owing to its sample efficiency. The large and variable number of existing orders in the order book motivates the use of high level actions and an alternative state representation. Historical data are used for the generation of a large number of artificial trajectories in order to address exploration issues during the learning process. The resulting policy is back-tested and compared against a benchmark strategy that is the current industrial standard. Results indicate that the agent converges to a policy that achieves, on average, higher total revenues than the benchmark strategy.
In the second part of this work, we generalise the direct policy search algorithms to an algorithm we call Direct Environment Search with (projected stochastic) Gradient Ascent (DESGA). The latter can be used to jointly learn a Reinforcement Learning (RL) environment and a policy with maximal expected return over a joint hypothesis space of environments and policies. We illustrate the performance of DESGA on two benchmarks. First, we consider a parametric space of mass spring damper environments. Then, we use our algorithm for optimizing the size of the components and the operation of a small-scale and autonomous energy system, i.e. a solar off-grid microgrid, composed of photovoltaic panels, batteries, etc. The results highlight the excellent performances of the DESGA algorithm.
File(s)
Document(s)
Cite this master thesis
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.