When CNN Meet with ViT: Towards Semi-supervised Learning for Multi-class Medical Image Semantic Segmentation

Ziyang Wang*, Tianze Li, Jian Qing Zheng, Baoru Huang

*Corresponding author for this work

Research output: Chapter in Book/Published conference outputConference publication

36 Citations (SciVal)

Abstract

Due to the lack of quality annotation in medical imaging community, semi-supervised learning methods are highly valued in image semantic segmentation tasks. In this paper, an advanced consistency-aware pseudo-label-based self-ensembling approach is presented to fully utilize the power of Vision Transformer (ViT) and Convolutional Neural Network (CNN) in semi-supervised learning. Our proposed framework consists of a feature-learning module which is enhanced by ViT and CNN mutually, and a guidance module which is robust for consistency-aware purposes. The pseudo labels are inferred and utilized recurrently and separately by views of CNN and ViT in the feature-learning module to expand the data set and are beneficial to each other. Meanwhile, a perturbation scheme is designed for the feature-learning module, and averaging network weight is utilized to develop the guidance module. By doing so, the framework combines the feature-learning strength of CNN and ViT, strengthens the performance via dual-view co-training, and enables consistency-aware supervision in a semi-supervised manner. A topological exploration of all alternative supervision modes with CNN and ViT are detailed validated, demonstrating the most promising performance and specific setting of our method on semi-supervised medical image segmentation tasks. Experimental results show that the proposed method achieves state-of-the-art performance on a public benchmark data set with a variety of metrics. The code is publicly available (https://github.com/ziyangwang007/CV-SSL-MIS ).

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2022 Workshops.
Subtitle of host publicationTel Aviv, Israel, October 23-27, 2022 Proceedings, Part VII
EditorsLeonid Karlinsky, Tomer Michaeli, Ko Nishino
Pages424-441
Number of pages18
Volume13807
ISBN (Electronic)9783031250828
DOIs
Publication statusPublished - 12 Feb 2023
EventWorkshops held at the 17th European Conference on Computer Vision, ECCV 2022 - Tel Aviv, Israel
Duration: 23 Oct 202227 Oct 2022

Publication series

NameLecture Notes in Computer Science (LNCS)
Volume13807
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceWorkshops held at the 17th European Conference on Computer Vision, ECCV 2022
Country/TerritoryIsrael
CityTel Aviv
Period23/10/2227/10/22

Bibliographical note

Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Fingerprint

Dive into the research topics of 'When CNN Meet with ViT: Towards Semi-supervised Learning for Multi-class Medical Image Semantic Segmentation'. Together they form a unique fingerprint.

Cite this