Skip to main navigation Skip to search Skip to main content

DMSAA-SLAM: RGB-D SLAM for Dynamic Scenes via Diffusion Self-Attention

  • Lei Xia
  • , Xiaomei Li
  • , Ziyang Wang
  • , Hui Chen
  • , Xianxun Zhu*
  • , Ling Fan
  • *Corresponding author for this work
  • Tongji University
  • Macquarie University
  • Nanyang Technological University

Research output: Contribution to journalArticlepeer-review

2 Downloads (Pure)

Abstract

In dynamic environments, performing RGB-D SLAM (Simultaneous Localization and Mapping) faces significant challenges primarily due to the presence of moving objects. The motion of these objects can introduce tracking errors and inaccuracies in map construction, thereby compromising the stability and overall performance of the system. To maintain high-precision localization and mapping under such conditions, a SLAM system must effectively detect and handle dynamic objects. To address these challenges, this paper presents a novel RGB-D SLAM method, referred to as DMSAA-SLAM (Dynamic Scene SLAM Based on Diffusion Model Self-Attention Aggregation). The core idea is to leverage a pre-trained stable diffusion model, particularly its self-attention layers, to handle the complexity of dynamic scenes. By employing a multi-resolution aggregation approach, combined with iterative merging and nonmaximum suppression, the proposed method generates high-precision segmentation masks. These masks enable fine-grained segmentation of moving objects and effectively eliminate dynamic feature points, thereby mitigating the impact of dynamic elements on the SLAM process and ensuring efficient and accurate tracking and mapping.
Original languageEnglish
Article number113576
Number of pages11
JournalPattern Recognition
Volume179
Issue numberPart A
Early online date25 Mar 2026
DOIs
Publication statusE-pub ahead of print - 25 Mar 2026

Bibliographical note

Copyright © 2026 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (https://creativecommons.org/licenses/by/4.0/).

Keywords

  • Diffusion model
  • Self-attention aggregation
  • Dynamic scene
  • Semantic segmentation

Fingerprint

Dive into the research topics of 'DMSAA-SLAM: RGB-D SLAM for Dynamic Scenes via Diffusion Self-Attention'. Together they form a unique fingerprint.

Cite this