Optimizing Latent Graph Representations of Surgical Scenes for Zero-Shot Domain Transfer
arxiv(2024)
摘要
Purpose: Advances in deep learning have resulted in effective models for
surgical video analysis; however, these models often fail to generalize across
medical centers due to domain shift caused by variations in surgical workflow,
camera setups, and patient demographics. Recently, object-centric learning has
emerged as a promising approach for improved surgical scene understanding,
capturing and disentangling visual and semantic properties of surgical tools
and anatomy to improve downstream task performance. In this work, we conduct a
multi-centric performance benchmark of object-centric approaches, focusing on
Critical View of Safety assessment in laparoscopic cholecystectomy, then
propose an improved approach for unseen domain generalization.
Methods: We evaluate four object-centric approaches for domain
generalization, establishing baseline performance. Next, leveraging the
disentangled nature of object-centric representations, we dissect one of these
methods through a series of ablations (e.g. ignoring either visual or semantic
features for downstream classification). Finally, based on the results of these
ablations, we develop an optimized method specifically tailored for domain
generalization, LG-DG, that includes a novel disentanglement loss function.
Results: Our optimized approach, LG-DG, achieves an improvement of 9.28
the best baseline approach. More broadly, we show that object-centric
approaches are highly effective for domain generalization thanks to their
modular approach to representation learning.
Conclusion: We investigate the use of object-centric methods for unseen
domain generalization, identify method-agnostic factors critical for
performance, and present an optimized approach that substantially outperforms
existing methods.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要