Abstract
In dynamic environments, performing RGB-D SLAM (Simultaneous Localization and Mapping) faces significant challenges primarily due to the presence of moving objects. The motion of these objects can introduce tracking errors and inaccuracies in map construction, thereby compromising the stability and overall performance of the system. To maintain high-precision localization and mapping under such conditions, a SLAM system must effectively detect and handle dynamic objects. To address these challenges, this paper presents a novel RGB-D SLAM method, referred to as DMSAA-SLAM (Dynamic Scene SLAM Based on Diffusion Model Self-Attention Aggregation). The core idea is to leverage a pre-trained stable diffusion model, particularly its self-attention layers, to handle the complexity of dynamic scenes. By employing a multi-resolution aggregation approach, combined with iterative merging and nonmaximum suppression, the proposed method generates high-precision segmentation masks. These masks enable fine-grained segmentation of moving objects and effectively eliminate dynamic feature points, thereby mitigating the impact of dynamic elements on the SLAM process and ensuring efficient and accurate tracking and mapping.
| Original language | English |
|---|---|
| Article number | 113576 |
| Number of pages | 11 |
| Journal | Pattern Recognition |
| Volume | 179 |
| Issue number | Part A |
| Early online date | 25 Mar 2026 |
| DOIs | |
| Publication status | E-pub ahead of print - 25 Mar 2026 |
Bibliographical note
Copyright © 2026 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (https://creativecommons.org/licenses/by/4.0/).Keywords
- Diffusion model
- Self-attention aggregation
- Dynamic scene
- Semantic segmentation
Fingerprint
Dive into the research topics of 'DMSAA-SLAM: RGB-D SLAM for Dynamic Scenes via Diffusion Self-Attention'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver