ActFormer: Scalable Collaborative Perception via Active Queries
arxiv(2024)
摘要
Collaborative perception leverages rich visual observations from multiple
robots to extend a single robot's perception ability beyond its field of view.
Many prior works receive messages broadcast from all collaborators, leading to
a scalability challenge when dealing with a large number of robots and sensors.
In this work, we aim to address scalable camera-based collaborative
perception with a Transformer-based architecture. Our key idea is to enable a
single robot to intelligently discern the relevance of the collaborators and
their associated cameras according to a learned spatial prior. This proactive
understanding of the visual features' relevance does not require the
transmission of the features themselves, enhancing both communication and
computation efficiency. Specifically, we present ActFormer, a Transformer that
learns bird's eye view (BEV) representations by using predefined BEV queries to
interact with multi-robot multi-camera inputs. Each BEV query can actively
select relevant cameras for information aggregation based on pose information,
instead of interacting with all cameras indiscriminately. Experiments on the
V2X-Sim dataset demonstrate that ActFormer improves the detection performance
from 29.89
showcasing the effectiveness of ActFormer in multi-agent collaborative 3D
object detection.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要