S4RoboFormer: Scribble-Supervised Surgical Robotic Segmentation Transformer via Augmented Consistency Training

Ziyang Wang*, Tianxiang Chen, Zi Ye, Yiyuan Ge, Zhihao Chen, Jiabao Li, Yifan Zhao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Advancements in deep learning for surgical instrument segmentation have notably improved the proficiency, safety, and efficacy of minimally invasive robotic surgeries. The effectiveness of deep learning, however, is contingent upon the availability of large datasets for training, which are often associated with substantial annotation costs. Given the dynamic nature of surgical robots, scribble-based labeling emerges as a more viable and cost-effective alternative to traditional pixel-wise dense labeling. This paper introduces the Scribble-Supervised Surgical Robotic Segmentation Transformer (S4RoboFormer), designed to mitigate the challenges posed by resource-intensive annotations. S4RoboFormer incorporates a Vision Transformer (ViT)-based U-shaped segmentation network, enhanced with a specialized Weakly-Supervised Learning (WSL) strategy that comprises consistency training through (i) data-based perturbation using a data-mixed interpolation technique, and (ii) network-based perturbation via a self-ensembling strategy. This methodology promotes uniform predictions across different levels of perturbation under conditions of limited-signal supervision. S4RoboFormer outperforms existing state-of-the-art baseline WSL frameworks with both convolutional neural network(CNN)-and ViT-based segmentation networks on a pre-processed public dataset. The code of S4RoboFormer, all baseline methods, pre-processed data, and scribble simulation algorithm are all made publicly available at https://github.com/ziyangwang007/CV-WSL-Robot.

Original languageEnglish
Pages (from-to)1789-1793
Number of pages5
JournalIEEE Transactions on Medical Robotics and Bionics
Volume7
Issue number4
Early online date29 Aug 2025
DOIs
Publication statusE-pub ahead of print - 29 Aug 2025

Keywords

  • Image Segmentation
  • Minimally Invasive Surgery
  • Surgical AI
  • Vision Transformer

Fingerprint

Dive into the research topics of 'S4RoboFormer: Scribble-Supervised Surgical Robotic Segmentation Transformer via Augmented Consistency Training'. Together they form a unique fingerprint.

Cite this