Training effective deep reinforcement learning agents for real-time life-cycle production optimization

Journal of Petroleum Science and Engineering(2022)

引用 64|浏览6
暂无评分
摘要
Life-cycle production optimization aims to obtain the optimal well control scheme at each time control step to maximize financial profit and hydrocarbon production. However, searching for the optimal policy under the limited number of simulation evaluations is a challenging task. In this paper, a novel production optimization method is presented, which maximizes the net present value (NPV) over the entire life-cycle and achieves real-time well control scheme adjustment. The proposed method models the life-cycle production optimization problem as a finite-horizon Markov decision process (MDP), where the well control scheme can be viewed as sequence decisions. Soft actor-critic, known as the state-of-the-art model-free deep reinforcement learning (DRL) algorithm, is subsequently utilized to train DRL agents that can solve the above MDP. The DRL agent strives to maximize long-term NPV rewards as well as the control scheme randomness by training a stochastic policy that maps reservoir states to well control variables and an action-value function that estimates the objective value of the current policy. Since the trained policy is an explicit function structure, the DRL agent can adjust the well control scheme in real-time under different reservoir states. Different from most existing methods that introduce task-specific sensitive parameters or construct complex supplementary structures, the DRL agent learns adaptively by executing goal-directed interactions with an uncertain reservoir environment and making use of accumulated well control experience, which is similar to the actual field well control mode. The key insight here is that the DRL method's ability to utilize gradients information (well-control experience) for higher sample efficiency. The simulation results based on two reservoir models indicate that compared to other optimization methods, the proposed method can attain higher NPV and access excellent performance in terms of oil displacement.
更多
查看译文
关键词
Production optimization,Deep reinforcement learning,Optimal control,Goal-directed interaction,Model free
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要