Skip to main navigation Skip to search Skip to main content

Zig-RiR: Zigzag RWKV-in-RWKV for Efficient Medical Image Segmentation

  • Tianxiang Chen
  • , Xudong Zhou
  • , Zhentao Tan
  • , Yue Wu
  • , Ziyang Wang
  • , Zi Ye*
  • , Tao Gong
  • , Qi Chu
  • , Nenghai Yu
  • , Le Lu
  • *Corresponding author for this work
  • University of Science and Technology of China (Hefei)
  • Anhui Province Key Laboratory of Digital Security
  • CCCD Key Lab of Ministry of Culture and Tourism
  • Institute of Intelligent Machines Chinese Academy of Sciences
  • Alibaba Group Holding Limited
  • Institute of Intelligent Software
  • Alibaba Group, USA

Research output: Contribution to journalArticlepeer-review

20   Link opens in a new tab Citations (SciVal)

Abstract

Medical image segmentation has made significant strides with the development of basic models. Specifically, models that combine CNNs with transformers can successfully extract both local and global features. However, these models inherit the transformer’s quadratic computational complexity, limiting their efficiency. Inspired by the recent Receptance Weighted Key Value (RWKV) model, which achieves linear complexity for long-distance modeling, we explore its potential for medical image segmentation. While directly applying vision-RWKV yields suboptimal results due to insufficient local feature exploration and disrupted spatial continuity, we propose a novel nested structure, Zigzag RWKV-in-RWKV (Zig-RiR), to address these issues. It consists of Outer and Inner RWKV blocks to adeptly capture both global and local features without disrupting spatial continuity. We treat local patches as “visual sentences” and use the Outer Zig-RWKV to explore global information. Then, we decompose each sentence into sub-patches (“visual words”) and use the Inner Zig-RWKV to further explore local information among words, at negligible computational cost. We also introduce a Zigzag-WKV attention mechanism to ensure spatial continuity during token scanning. By aggregating visual word and sentence features, our Zig-RiR can effectively explore both global and local information while preserving spatial continuity. Experiments on four medical image segmentation datasets of both 2D and 3D modalities demonstrate the superior accuracy and efficiency of our method, outperforming the state-of-the-art method 14.4 times in speed and reducing GPU memory usage by 89.5% when testing on 1024 x 1024 high-resolution medical images.

Original languageEnglish
Pages (from-to)3245-3257
Number of pages13
JournalIEEE Transactions on Medical Imaging
Volume44
Issue number8
Early online date17 Apr 2025
DOIs
Publication statusPublished - Aug 2025

Data Access Statement

Our code is available at https://github.com/txchen-USTC/Zig-RiR

Keywords

  • Medical image segmentation
  • RWKV
  • RWKV-in-RWKV
  • zigzag scan

Fingerprint

Dive into the research topics of 'Zig-RiR: Zigzag RWKV-in-RWKV for Efficient Medical Image Segmentation'. Together they form a unique fingerprint.

Cite this