TY - JOUR
T1 - S4RoboFormer: Scribble-Supervised Surgical Robotic Segmentation Transformer via Augmented Consistency Training
AU - Wang, Ziyang
AU - Chen, Tianxiang
AU - Ye, Zi
AU - Ge, Yiyuan
AU - Chen, Zhihao
AU - Li, Jiabao
AU - Zhao, Yifan
PY - 2025/8/29
Y1 - 2025/8/29
N2 - Advancements in deep learning for surgical instrument segmentation have notably improved the proficiency, safety, and efficacy of minimally invasive robotic surgeries. The effectiveness of deep learning, however, is contingent upon the availability of large datasets for training, which are often associated with substantial annotation costs. Given the dynamic nature of surgical robots, scribble-based labeling emerges as a more viable and cost-effective alternative to traditional pixel-wise dense labeling. This paper introduces the Scribble-Supervised Surgical Robotic Segmentation Transformer (S4RoboFormer), designed to mitigate the challenges posed by resource-intensive annotations. S4RoboFormer incorporates a Vision Transformer (ViT)-based U-shaped segmentation network, enhanced with a specialized Weakly-Supervised Learning (WSL) strategy that comprises consistency training through (i) data-based perturbation using a data-mixed interpolation technique, and (ii) network-based perturbation via a self-ensembling strategy. This methodology promotes uniform predictions across different levels of perturbation under conditions of limited-signal supervision. S4RoboFormer outperforms existing state-of-the-art baseline WSL frameworks with both convolutional neural network(CNN)-and ViT-based segmentation networks on a pre-processed public dataset. The code of S4RoboFormer, all baseline methods, pre-processed data, and scribble simulation algorithm are all made publicly available at https://github.com/ziyangwang007/CV-WSL-Robot.
AB - Advancements in deep learning for surgical instrument segmentation have notably improved the proficiency, safety, and efficacy of minimally invasive robotic surgeries. The effectiveness of deep learning, however, is contingent upon the availability of large datasets for training, which are often associated with substantial annotation costs. Given the dynamic nature of surgical robots, scribble-based labeling emerges as a more viable and cost-effective alternative to traditional pixel-wise dense labeling. This paper introduces the Scribble-Supervised Surgical Robotic Segmentation Transformer (S4RoboFormer), designed to mitigate the challenges posed by resource-intensive annotations. S4RoboFormer incorporates a Vision Transformer (ViT)-based U-shaped segmentation network, enhanced with a specialized Weakly-Supervised Learning (WSL) strategy that comprises consistency training through (i) data-based perturbation using a data-mixed interpolation technique, and (ii) network-based perturbation via a self-ensembling strategy. This methodology promotes uniform predictions across different levels of perturbation under conditions of limited-signal supervision. S4RoboFormer outperforms existing state-of-the-art baseline WSL frameworks with both convolutional neural network(CNN)-and ViT-based segmentation networks on a pre-processed public dataset. The code of S4RoboFormer, all baseline methods, pre-processed data, and scribble simulation algorithm are all made publicly available at https://github.com/ziyangwang007/CV-WSL-Robot.
KW - Image Segmentation
KW - Minimally Invasive Surgery
KW - Surgical AI
KW - Vision Transformer
UR - http://www.scopus.com/inward/record.url?scp=105014625764&partnerID=8YFLogxK
UR - https://ieeexplore.ieee.org/document/11145188
U2 - 10.1109/TMRB.2025.3604103
DO - 10.1109/TMRB.2025.3604103
M3 - Article
AN - SCOPUS:105014625764
VL - 7
SP - 1789
EP - 1793
JO - IEEE Transactions on Medical Robotics and Bionics
JF - IEEE Transactions on Medical Robotics and Bionics
IS - 4
ER -