GaitFormer: Leveraging dual-stream spatial–temporal Vision Transformer via a single low-cost RGB camera for clinical gait analysis

Jiabao Li, Ziyang Wang*, Chengjun Wang, Wenhang Su

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

14 Citations (SciVal)

Abstract

Gait analysis is an essential technique in treating patients with lower limb dysfunctions. Traditional methods often rely on expensive and complex equipment, such as wearable body sensors and a multi-camera with marker tracking system. Aiming for a more cost-effective yet accurate alternative, this paper introduces GaitFormer, a novel approach that leverages Vision Transformer (ViT) for gait analysis using minimal, non-invasive equipment, i.e. a single low-cost RGB camera. Initially, a unique dataset using a multi-camera system with marker tracking, comprising 6 walking patterns gathered from 80 volunteers is developed. The ViT-based GaitFormer is then proposed to automatically recognize human walking patterns through a single RGB camera. GaitFormer comprises hybrid networks for each step, including: (i) a cascaded convolutional 2D human key points estimation network; (ii) a ViT-based dual-stream spatial–temporal network extending the information of human key points into 3D; (iii) leveraging specific lower limb key joints’ angle features for clinical gait analysis, capturing the geometric, kinematic, and physical attributes of human motion; (iv) employing a pure self-attention-based classification network to recognize clinical human walking patterns. The experiments are designed to comprehensively validate each step against various related baseline methods and multi-camera tracking system, with results demonstrating the promising performance of GaitFormer as an affordable, precise, and integrated solution. To the best of our knowledge, GaitFormer is the first hybrid CNN- and ViT-based end-to-end solution via low-cost device for clinically valuable gait analysis.

Original languageEnglish
Article number111810
Number of pages15
JournalKnowledge-Based Systems
Volume295
Early online date18 Apr 2024
DOIs
Publication statusPublished - 8 Jul 2024

Keywords

  • Gait analysis
  • Healthcare
  • Human pose estimation
  • Single RGB camera
  • Vision Transformer

Fingerprint

Dive into the research topics of 'GaitFormer: Leveraging dual-stream spatial–temporal Vision Transformer via a single low-cost RGB camera for clinical gait analysis'. Together they form a unique fingerprint.

Cite this