Towards accurate surgical workflow recognition with convolutional networks and transformers

Bokai Zhang,Julian Abbing,Amer Ghanem,Danyal Fer,Jocelyn Barker,Rami Abukhalil,Varun Kejriwal Goel,Fausto Milletari

COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION（2022）

引用 8|浏览1

暂无评分

摘要

Recognising workflow phases from endoscopic surgical videos is crucial to deriving indicators that convey the quality, efficiency, outcome of the surgery, and offering insights into surgical team skills. Additionally, workflow information is used to organise large surgical video libraries for training purposes. In this paper, we explore different deep networks that capture spatial and temporal information from surgical videos for surgical workflow recognition. The approach is based on a combination of two networks: The first network is used for feature extraction from video snippets. The second network is performing action segmentation to identify the different parts of the surgical workflow by analysing the extracted features. This work focuses on proposing, comparing, and analysing different design choices. This includes fully convolutional, fully transformer, and hybrid models, which consist of transformers used in conjunction with convolutions. We evaluate the methods against a large dataset of endoscopic surgical videos acquired during Gastric Bypass surgery. Both our proposed fully transformer method and fully convolutional approach achieve state-of-the-art results. By integrating transformers and convolutions, our hybrid model achieves 93% frame-level accuracy and 85 segmental edit distance score. This demonstrates the potential of hybrid models that employ both transformers and convolutions for accurate surgical workflow recognition.

查看译文

关键词

Surgical workflow recognition, feature extraction, action segmentation, fully convolutional, fully transformer, hybrid models

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要