A prescriptive Dirichlet power allocation policy with deep reinforcement learning

Reliability Engineering & System Safety(2022)

引用 5|浏览14
暂无评分
摘要
Prescribing optimal operation based on the condition of the system, and thereby potentially prolonging its remaining useful lifetime, has tremendous potential in terms of actively managing the availability, maintenance, and costs of complex systems. Reinforcement learning (RL) algorithms are particularly suitable for this type of problem given their learning capabilities. A special case of a prescriptive operation is the power allocation task, which can be considered as a sequential allocation problem whereby the action space is bounded by a simplex constraint. A general continuous action-space solution of such sequential allocation problems has still remained an open research question for RL algorithms. In continuous action space, the standard Gaussian policy applied in reinforcement learning does not support simplex constraints, while the Gaussian-softmax policy introduces a bias during training. In this work, we propose the Dirichlet policy for continuous allocation tasks and analyze the bias and variance of its policy gradients. We demonstrate that the Dirichlet policy is bias-free and provides significantly faster convergence, better performance, and better robustness to hyperparameter changes as compared to the Gaussian-softmax policy. Moreover, we demonstrate the applicability of the proposed algorithm on a prescriptive operation case in which we propose the Dirichlet power allocation policy and evaluate its performance on a case study of a set of multiple lithium-ion (Li-I) battery systems. The experimental results demonstrate the potential to prescribe optimal operation, improving the efficiency and sustainability of multi-power source systems.
更多
查看译文
关键词
Reinforcement learning,Deep learning,Prescriptive operation,Multi-power source systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要