TY - JOUR
T1 - GaitFormer: Leveraging dual-stream spatial–temporal Vision Transformer via a single low-cost RGB camera for clinical gait analysis
AU - Li, Jiabao
AU - Wang, Ziyang
AU - Wang, Chengjun
AU - Su, Wenhang
PY - 2024/7/8
Y1 - 2024/7/8
N2 - Gait analysis is an essential technique in treating patients with lower limb dysfunctions. Traditional methods often rely on expensive and complex equipment, such as wearable body sensors and a multi-camera with marker tracking system. Aiming for a more cost-effective yet accurate alternative, this paper introduces GaitFormer, a novel approach that leverages Vision Transformer (ViT) for gait analysis using minimal, non-invasive equipment, i.e. a single low-cost RGB camera. Initially, a unique dataset using a multi-camera system with marker tracking, comprising 6 walking patterns gathered from 80 volunteers is developed. The ViT-based GaitFormer is then proposed to automatically recognize human walking patterns through a single RGB camera. GaitFormer comprises hybrid networks for each step, including: (i) a cascaded convolutional 2D human key points estimation network; (ii) a ViT-based dual-stream spatial–temporal network extending the information of human key points into 3D; (iii) leveraging specific lower limb key joints’ angle features for clinical gait analysis, capturing the geometric, kinematic, and physical attributes of human motion; (iv) employing a pure self-attention-based classification network to recognize clinical human walking patterns. The experiments are designed to comprehensively validate each step against various related baseline methods and multi-camera tracking system, with results demonstrating the promising performance of GaitFormer as an affordable, precise, and integrated solution. To the best of our knowledge, GaitFormer is the first hybrid CNN- and ViT-based end-to-end solution via low-cost device for clinically valuable gait analysis.
AB - Gait analysis is an essential technique in treating patients with lower limb dysfunctions. Traditional methods often rely on expensive and complex equipment, such as wearable body sensors and a multi-camera with marker tracking system. Aiming for a more cost-effective yet accurate alternative, this paper introduces GaitFormer, a novel approach that leverages Vision Transformer (ViT) for gait analysis using minimal, non-invasive equipment, i.e. a single low-cost RGB camera. Initially, a unique dataset using a multi-camera system with marker tracking, comprising 6 walking patterns gathered from 80 volunteers is developed. The ViT-based GaitFormer is then proposed to automatically recognize human walking patterns through a single RGB camera. GaitFormer comprises hybrid networks for each step, including: (i) a cascaded convolutional 2D human key points estimation network; (ii) a ViT-based dual-stream spatial–temporal network extending the information of human key points into 3D; (iii) leveraging specific lower limb key joints’ angle features for clinical gait analysis, capturing the geometric, kinematic, and physical attributes of human motion; (iv) employing a pure self-attention-based classification network to recognize clinical human walking patterns. The experiments are designed to comprehensively validate each step against various related baseline methods and multi-camera tracking system, with results demonstrating the promising performance of GaitFormer as an affordable, precise, and integrated solution. To the best of our knowledge, GaitFormer is the first hybrid CNN- and ViT-based end-to-end solution via low-cost device for clinically valuable gait analysis.
KW - Gait analysis
KW - Healthcare
KW - Human pose estimation
KW - Single RGB camera
KW - Vision Transformer
UR - http://www.scopus.com/inward/record.url?scp=85190996940&partnerID=8YFLogxK
UR - https://www.sciencedirect.com/science/article/pii/S0950705124004441?via%3Dihub
U2 - 10.1016/j.knosys.2024.111810
DO - 10.1016/j.knosys.2024.111810
M3 - Article
AN - SCOPUS:85190996940
SN - 0950-7051
VL - 295
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 111810
ER -