Coarse-to-Fine Active Segmentation of Interactable Parts in Real Scene Images

arxiv(2023)

引用 1|浏览124
暂无评分
摘要
We introduce the first active learning (AL) framework for high-accuracy instance segmentation of dynamic, interactable parts from RGB images of real indoor scenes. As with most human-in-the-loop approaches, the key criterion for success in AL is to minimize human effort while still attaining high performance. To this end, we employ a transformer-based segmentation network that utilizes a masked-attention mechanism. To enhance the network, tailoring to our task, we introduce a coarse-to-fine model which first uses object-aware masked attention and then a pose-aware one, leveraging a correlation between interactable parts and object poses and leading to improved handling of multiple articulated objects in an image. Our coarse-to-fine active segmentation module learns both 2D instance and 3D pose information using the transformer, which supervises the active segmentation and effectively reduces human effort. Our method achieves close to fully accurate (96% and higher) segmentation results on real images, with 77% time saving over manual effort, where the training data consists of only 16.6% annotated real photographs. At last, we contribute a dataset of 2,550 real photographs with annotated interactable parts, demonstrating its superior quality and diversity over the current best alternative.
更多
查看译文
关键词
segmentation,interactable parts,real scene,coarse-to-fine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要