The Impact of Reward Shaping in Reinforcement Learning for Agent-based Microgrid Control

Valentin Père,Fabien Baillon,Mathieu Milhe,Jean-Louis Dirion

Computer Aided Chemical Engineering 32nd European Symposium on Computer Aided Process Engineering（2022）

引用 1|浏览1

暂无评分

摘要

In order to reduce CO2 emissions, electricity networks must increasingly integrate renewable energies. Microgrids are distributed electrical networks with their own generation and load, often supported by an electrical storage system. It can be connected to the external electrical network or isolated. Since electricity consumption, price and renewable production are stochastic phenomena, the control of microgrids must adapt to uncertainties. Data-driven models and in particular reinforcement learning (RL) have become efficient algorithms in high-level microgrid control. RL are agent-based algorithms, which interact with their environment and learn with a numerical reward signal. A certain behavior can implicitly be expected when the reward system is formulated. For example, a reward system that encourages the agent to interact as little as possible with the external network will explicitly increase the autonomy of the microgrid. Implicitly, it can be expected to schedule the battery to maximize the ratio of renewable energy used to the amount producible. Q-learning algorithm has been used due to its performance in discrete action space, which simplified the benchmark complexity. An agent is trained with different reward functions commonly found in the literature related to data-driven microgrid control algorithms. The agent parameters do not vary from one case study to another. Indicators are set up to evaluate the agent behavior. They are based on implicit behavioral criteria in the definition of the reward system such as the ratio of renewable energy used, the amount of energy stored during peak hours, etc. This study enables to find a way to rationalize the choice of a reward system to control in a near-optimal way microgrid while meeting implicit secondary objectives. It could lead to a choice on weighting coefficient in a combination of reward functions.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要