When RNN Meets CNN and ViT: The Development of a Hybrid U-Net for Medical Image Segmentation

Ziru Wang, Ziyang Wang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Downloads (Pure)

Abstract

Deep learning for semantic segmentation has made significant advances in recent years, achieving state-of-the-art performance. Medical image segmentation, as a key component of healthcare systems, plays a vital role in the diagnosis and treatment planning of diseases. Due to the fractal and scale-invariant nature of biological structures, effective medical image segmentation requires models capable of capturing hierarchical and self-similar representations across multiple spatial scales. In this paper, a Recurrent Neural Network (RNN) is explored within the Convolutional Neural Network (CNN) and Vision Transformer (ViT)-based hybrid U-shape network, named RCV-UNet. First, the ViT-based layer was developed in the bottleneck to effectively capture the global context of an image and establish long-range dependencies through the self-attention mechanism. Second, recurrent residual convolutional blocks (RRCBs) were introduced in both the encoder and decoder to enhance the ability to capture local features and preserve fine details. Third, by integrating the global feature extraction capability of ViT with the local feature enhancement strength of RRCBs, RCV-UNet achieved promising global consistency and boundary refinement, addressing key challenges in medical image segmentation. From a fractal–fractional perspective, the multi-scale encoder–decoder hierarchy and attention-driven aggregation in RCV-UNet naturally accommodate fractal-like, scale-invariant regularity, while the recurrent and residual connections approximate fractional-order dynamics in feature propagation, enabling continuous and memory-aware representation learning. The proposed RCV-UNet was evaluated on four different modalities of images, including CT, MRI, Dermoscopy, and ultrasound, using the Synapse, ACDC, ISIC 2018, and BUSI datasets. Experimental results demonstrate that RCV-UNet outperforms other popular baseline methods, achieving strong performance across different segmentation tasks. The code of the proposed method will be made publicly available.
Original languageEnglish
Article number18
Number of pages26
JournalFractal and Fractional
Volume10
Issue number1
Early online date27 Dec 2025
DOIs
Publication statusPublished - 1 Jan 2026

Bibliographical note

Copyright © 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Fingerprint

Dive into the research topics of 'When RNN Meets CNN and ViT: The Development of a Hybrid U-Net for Medical Image Segmentation'. Together they form a unique fingerprint.

Cite this