Stochastic Economic Lot Scheduling via Self-Attention Based Deep Reinforcement Learning

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING(2023)

引用 2|浏览14
暂无评分
摘要
The Stochastic Economic Lot Scheduling Problem (SELSP) is a difficult dynamic optimization problem with wide industrial applications. Traditional methods such as hyper-heuristics are manually designed based on substantial expert knowledge, which may limit their optimization performance. Recently, Deep Reinforcement Learning (DRL) is shown to be promising in automatically learning scheduling policies for SELSP. However, its performance is still quite far from that of hyper-heuristics, due to the lack of suitable deep models. In this paper, we propose a novel DRL method to learn dynamic scheduling policies for SELSP in an end-to-end fashion. Based on self-attention, our method can effectively extract useful features from raw state information, and is flexible in handling different numbers of products, which is not viable for previous methods. Experiments on a complex biopharmaceutical manufacturing process show that our method outperforms a recent DRL method and state-of-the-art hyper-heuristics. Moreover, the trained policy performs better in environments different from training with demand forecast errors and varying number of products, showing its strong robustness and generalization ability.Note to Practitioners-The Stochastic Economic Lot Scheduling Problem (SELSP) is an important problem for manufacturing enterprises, which is to optimally balance the production and inventory so as to minimize the total cost. However, SELSP is very challenging to solve due to the involvement of uncertain factors such as customer demands and machine failures. Traditional methods for solving SELSP, such as heuristic policies and hyper-heuristics, heavily rely on human experiences to design and hence the performance could be limited. This paper proposes a Deep Reinforcement Learning (DRL) based method to automatically learn scheduling policy for solving SELSP, which could alleviate the above limitation through a self-attention based feature extraction mechanism and reward based training. Experimental results on a realistic manufacturing process show that our method can deliver higher revenue than conventional manual policy and an existing DRL based method.
更多
查看译文
关键词
Production,Job shop scheduling,Metaheuristics,Costs,Dynamic scheduling,Reinforcement learning,Deep learning,Deep reinforcement learning,stochastic economic lot scheduling,self-attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要