TY - GEN
T1 - Quadruple-Consistency Vision Transformer for Medical Image Segmentation with Limited Number of Sparse Annotations
AU - Liu, Yufan
AU - Wang, Ziyang
AU - Chen, Tianxiang
AU - Ye, Zi
PY - 2024/9/27
Y1 - 2024/9/27
N2 - Deep learning has significantly advanced the field of medical image segmentation but typically relies on extensive, densely annotated datasets, which are both costly and time-consuming to prepare. In response to the need for reducing annotation efforts, this study investigates a novel supervision approach named Semi-Scribble Supervised Learning, which utilizes a combination of semi-supervised (SSL) and weakly-supervised learning (WSL) techniques. This approach leverages both a large volume of unlabeled data and a smaller set of sparsely annotated, scribble-based labels. We introduce the Quadruple-Consistency Vision Transformer (4C-ViT), which capitalizes on the recent success of Vision Transformers in capturing intricate image features. Specifically, the proposed 4C-ViT employs an advanced consistency training strategy that incorporates quadruple perturbations at both the data and network levels, enhancing the network's robustness and performance. The efficacy of 4C-ViT is demonstrated on a publicly available MRI cardiac segmentation benchmark, where it outperforms other baseline methods across several evaluation metrics. The proposed 4C-ViT, alongside all baseline methods and the challenging yet realistic dataset, is made public available at https://github.com/ziyangwang007/CVSSL-MIS.
AB - Deep learning has significantly advanced the field of medical image segmentation but typically relies on extensive, densely annotated datasets, which are both costly and time-consuming to prepare. In response to the need for reducing annotation efforts, this study investigates a novel supervision approach named Semi-Scribble Supervised Learning, which utilizes a combination of semi-supervised (SSL) and weakly-supervised learning (WSL) techniques. This approach leverages both a large volume of unlabeled data and a smaller set of sparsely annotated, scribble-based labels. We introduce the Quadruple-Consistency Vision Transformer (4C-ViT), which capitalizes on the recent success of Vision Transformers in capturing intricate image features. Specifically, the proposed 4C-ViT employs an advanced consistency training strategy that incorporates quadruple perturbations at both the data and network levels, enhancing the network's robustness and performance. The efficacy of 4C-ViT is demonstrated on a publicly available MRI cardiac segmentation benchmark, where it outperforms other baseline methods across several evaluation metrics. The proposed 4C-ViT, alongside all baseline methods and the challenging yet realistic dataset, is made public available at https://github.com/ziyangwang007/CVSSL-MIS.
KW - Medical Image Segmentation
KW - Semi-Supervised Learning
KW - Vision Transformer
KW - Weakly-Supervised Learning
UR - https://ieeexplore.ieee.org/document/10647711
UR - http://www.scopus.com/inward/record.url?scp=85216875925&partnerID=8YFLogxK
U2 - 10.1109/ICIP51287.2024.10647711
DO - 10.1109/ICIP51287.2024.10647711
M3 - Conference publication
AN - SCOPUS:85216875925
T3 - Proceedings - International Conference on Image Processing (ICIP)
SP - 2101
EP - 2107
BT - 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 2024
PB - IEEE
T2 - 31st IEEE International Conference on Image Processing, ICIP 2024
Y2 - 27 October 2024 through 30 October 2024
ER -