Abstract
Medical image segmentation has made significant strides with the development of basic models. Specifically, models that combine CNNs with transformers can successfully extract both local and global features. However, these models inherit the transformer’s quadratic computational complexity, limiting their efficiency. Inspired by the recent Receptance Weighted Key Value (RWKV) model, which achieves linear complexity for long-distance modeling, we explore its potential for medical image segmentation. While directly applying vision-RWKV yields suboptimal results due to insufficient local feature exploration and disrupted spatial continuity, we propose a novel nested structure, Zigzag RWKV-in-RWKV (Zig-RiR), to address these issues. It consists of Outer and Inner RWKV blocks to adeptly capture both global and local features without disrupting spatial continuity. We treat local patches as “visual sentences” and use the Outer Zig-RWKV to explore global information. Then, we decompose each sentence into sub-patches (“visual words”) and use the Inner Zig-RWKV to further explore local information among words, at negligible computational cost. We also introduce a Zigzag-WKV attention mechanism to ensure spatial continuity during token scanning. By aggregating visual word and sentence features, our Zig-RiR can effectively explore both global and local information while preserving spatial continuity. Experiments on four medical image segmentation datasets of both 2D and 3D modalities demonstrate the superior accuracy and efficiency of our method, outperforming the state-of-the-art method 14.4 times in speed and reducing GPU memory usage by 89.5% when testing on 1024 x 1024 high-resolution medical images.
| Original language | English |
|---|---|
| Pages (from-to) | 3245-3257 |
| Number of pages | 13 |
| Journal | IEEE Transactions on Medical Imaging |
| Volume | 44 |
| Issue number | 8 |
| Early online date | 17 Apr 2025 |
| DOIs | |
| Publication status | Published - Aug 2025 |
Data Access Statement
Our code is available at https://github.com/txchen-USTC/Zig-RiRKeywords
- Medical image segmentation
- RWKV
- RWKV-in-RWKV
- zigzag scan
Fingerprint
Dive into the research topics of 'Zig-RiR: Zigzag RWKV-in-RWKV for Efficient Medical Image Segmentation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver