Forest Graph Convolutional Network for Surgical Action Triplet Recognition in Endoscopic Videos

IEEE Transactions on Circuits and Systems for Video Technology(2022)

引用 2|浏览25
暂无评分
摘要
Recognizing surgical activities in endoscopic videos is of vital importance for developing context-aware decision support in the operating room. In this work, we model each surgical activity as an action triplet, consisting of the surgical instrument, the action, and the target organ that the instrument is interacting with. The goal is to recognize these action triplets from endoscopic videos. However, correctly recognizing fine-grained activity triplets is challenging because of the long-tail distribution of the triplet classes and the complex associations between triplets as well as within each triplet. In addition, multiple triplets may appear in a given video frame. To address these challenges, we propose a new model for surgical action triplet recognition based on a classification forest and Graph Convolutional Network (GCN), which we call Forest GCN. The classification forest is employed to calibrate fine-grained triplet classifiers by the upstream parent classifiers to suppress noisy logits of the triplet classes in the long tail. And stacked GCNs are designed to model the dependencies between triplet classes while leveraging the language embedding. Experiments on the endoscopic video dataset, CholecT50, demonstrate that our proposed method outperforms current state-of-the-art methods on surgical action triplet recognition.
更多
查看译文
关键词
Classification forest,graph convolutional network,surgical action recognition,endoscopic videos
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要