OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
CVPR 2024(2024)
摘要
Event-based semantic segmentation (ESS) is a fundamental yet challenging task
for event camera sensing. The difficulties in interpreting and annotating event
data limit its scalability. While domain adaptation from images to event data
can help to mitigate this issue, there exist data representational differences
that require additional effort to resolve. In this work, for the first time, we
synergize information from image, text, and event-data domains and introduce
OpenESS to enable scalable ESS in an open-world, annotation-efficient manner.
We achieve this goal by transferring the semantically rich CLIP knowledge from
image-text pairs to event streams. To pursue better cross-modality adaptation,
we propose a frame-to-event contrastive distillation and a text-to-event
semantic consistency regularization. Experimental results on popular ESS
benchmarks showed our approach outperforms existing methods. Notably, we
achieve 53.93
event or frame labels.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要