Bringing Back the Context: Camera Trap Species Identification as Link Prediction on Multimodal Knowledge Graphs
CoRR(2023)
摘要
Camera traps are valuable tools in animal ecology for biodiversity monitoring
and conservation. However, challenges like poor generalization to deployment at
new unseen locations limit their practical application. Images are naturally
associated with heterogeneous forms of context possibly in different
modalities. In this work, we leverage the structured context associated with
the camera trap images to improve out-of-distribution generalization for the
task of species identification in camera traps. For example, a photo of a wild
animal may be associated with information about where and when it was taken, as
well as structured biology knowledge about the animal species. While typically
overlooked by existing work, bringing back such context offers several
potential benefits for better image understanding, such as addressing data
scarcity and enhancing generalization. However, effectively integrating such
heterogeneous context into the visual domain is a challenging problem. To
address this, we propose a novel framework that reformulates species
classification as link prediction in a multimodal knowledge graph (KG). This
framework seamlessly integrates various forms of multimodal context for visual
recognition. We apply this framework for out-of-distribution species
classification on the iWildCam2020-WILDS and Snapshot Mountain Zebra datasets
and achieve competitive performance with state-of-the-art approaches.
Furthermore, our framework successfully incorporates biological taxonomy for
improved generalization and enhances sample efficiency for recognizing
under-represented species.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要