Markov Decision Process Design for Imitation of Optimal Task Schedulers.

SSP(2023)

引用 1|浏览0
暂无评分
摘要
Due to the generally prohibitive computational requirements of optimal task schedulers, much of the field of task scheduling focuses on designing fast suboptimal algorithms. Since the tree search commonly used by sequencing algorithms such as Branch-and-Bound can naturally be framed as a Markov decision process, designing schedulers using imitation and reinforcement learning is a promising and active area of research. This paper demonstrates how polices can be trained on previously solved scheduling problems and successfully generalize to novel ones. Instead of focusing on policy design, however, this work focuses on designing the Markov decision process observation and reward functions to make learning as effective and efficient as possible. This can be of critical importance when training data is limited or when only simple, fast policies are practical. Various Markov decision process designs are introduced and simulation examples demonstrate the resultant increases in policy performance, even without integration into search algorithms.
更多
查看译文
关键词
Scheduling, imitation learning, Markov decision process, tree search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要