Multi-Agent Deep Reinforcement Learning for Persistent Monitoring With Sensing, Communication, and Localization Constraints

Manav Mishra,Prithvi Poddar, Rajat Agrawal,Jingxi Chen,Pratap Tokekar, P. B. Sujit

IEEE Transactions on Automation Science and Engineering(2024)

引用 0|浏览4
暂无评分
摘要
Determining multi-robot motion policies for persistently monitoring a region with limited sensing, communication, and localization constraints in non-GPS environments is a challenging problem. To take the localization constraints into account, in this paper, we consider a heterogeneous robotic system consisting of two types of agents: anchor agents with accurate localization capability and auxiliary agents with low localization accuracy. To localize itself, the auxiliary agents must be within the communication range of an anchor, directly or indirectly. The robotic team’s objective is to minimize environmental uncertainty through persistent monitoring. We propose a multi-agent deep reinforcement learning (MARL) based architecture with graph convolution called Graph Localized Proximal Policy Optimization (GALOPP), which incorporates the limited sensor field-of-view, communication, and localization constraints of the agents along with persistent monitoring objectives to determine motion policies for each agent. We evaluate the performance of GALOPP on open maps with obstacles having a different number of anchor and auxiliary agents. We further study 1) the effect of communication range, obstacle density, and sensing range on the performance and 2) compare the performance of GALOPP with area partition, greedy search, random search, and random search with communication constraint strategies. For its generalization capability, we also evaluated GALOPP in two different environments – 2-room and 4-room. The results show that GALOPP learns the policies and monitors the area well. As a proof-of-concept, we perform hardware experiments to demonstrate the performance of GALOPP. Note to Practitioners —Persistent monitoring is performed in various applications like search and rescue, border patrol, wildlife monitoring, etc. Typically, these applications are large-scale, and hence using a multi-robot system helps achieve the mission objectives effectively. Often, the robots are subject to limited sensing range and communication range, and they may need to operate in GPS-denied areas. In such scenarios, developing motion planning policies for the robots is difficult. Due to the lack of GPS, alternative localization mechanisms, like SLAM, high-accurate INS, UWB radio, etc. are essential. Having SLAM or a highly accurate INS system is expensive, and hence we use agents having a combination of expensive, accurate localization systems (anchor agents) and low-cost INS systems (auxiliary agents) whose localization can be made accurate using cooperative localization techniques. To determine efficient motion policies, we use a multi-agent deep reinforcement learning technique (GALOPP) that takes the heterogeneity in the vehicle localization capability, limited sensing, and communication constraints into account. GALOPP is evaluated using simulations and compared with baselines like random search, random search with ensured communication, greedy search, and area partitioning. The results show that GALOPP outperforms the baselines. The GALOPP approach offers a generic solution that be adopted with various other applications.
更多
查看译文
关键词
Multi-agent deep reinforcement learning (MARL),persistent monitoring (PM),graph neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要