Abstract
Medical image analysis has benefited from deep learning techniques not only because of network architecture engineering, but also a large number of high-quality annotations which is time- and labour-consuming. Motivated by the recent success of Vision Transformer(ViT), we propose to explore the power of ViT for medical image semantic segmentation with an advanced Semi-Supervised Learning(SSL) fashion via MixUp-based interpolation consistency training and adversarial training. Aiming to train Segmentation ViT model(sViT) with labelled and unlabelled data simultaneously, an adversarial SSL framework that consists of a sViT and an evaluation model(EM) is proposed in this paper. During the adversarial training process, the EM is trained to classify the quality of inference of sViT is from labelled/unlabelled sample, and the sViT is initialized and trained against EM (i.e. all inference by sViT is high-quality enough to be classified as if from labelled data). To further boost the performance of sViT, MixUp-based interpolation consistency training is introduced and utilized for sViT. The whole adversarial training is designed separately for sViT and EM in an iterative manner, and the MixUp is solely for sViT. Experimental results(including replacing sViT to CNN) demonstrate the proposed method competitive performance against other SSL methods on a public benchmark data set with a variety of metrics. The code is publicly available on GitHub.
| Original language | English |
|---|---|
| Number of pages | 13 |
| Publication status | Published - 2022 |
| Event | 33rd British Machine Vision Conference Proceedings, BMVC 2022 - London, United Kingdom Duration: 21 Nov 2022 → 24 Nov 2022 |
Conference
| Conference | 33rd British Machine Vision Conference Proceedings, BMVC 2022 |
|---|---|
| Country/Territory | United Kingdom |
| City | London |
| Period | 21/11/22 → 24/11/22 |