Multi-scale pedestrian intent prediction using 3D joint information as spatio-temporal representation

Sarfraz Ahmed; Ammar  Al-Bazi; Chitta Saha; Sujan Rajbhandari; M. Nazmul Huda

doi:10.1016/j.eswa.2023.120077

Multi-scale pedestrian intent prediction using 3D joint information as spatio-temporal representation

Sarfraz Ahmed, Ammar Al-Bazi, Chitta Saha, Sujan Rajbhandari, M. Nazmul Huda^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

There has been a rise of use of Autonomous Vehicles on public roads. With the predicted rise of road traffic accidents over the coming years, these vehicles must be capable of safely operate in the public domain. The field of pedestrian detection has significantly advanced in the last decade, providing high-level accuracy, with some technique reaching near-human level accuracy. However, there remains further work required for pedestrian intent prediction to reach human-level performance. One of the challenges facing current pedestrian intent predictors are the varying scales of pedestrians, particularly smaller pedestrians. This is because smaller pedestrians can blend into the background, making them difficult to detect, track or apply pose estimations techniques. Therefore, in this work, we present a novel intent prediction approach for multi-scale pedestrians using 2D pose estimation and a Long Short-term memory (LSTM) architecture. The pose estimator predicts keypoints for the pedestrian along the video frames. Based on the accumulation of these keypoints along the frames, spatio-temporal data is generated. This spatio-temporal data is fed to the LSTM for classifying the crossing behaviour of the pedestrians. We evaluate the performance of the proposed techniques on the popular Joint Attention in Autonomous Driving (JAAD) dataset and the new larger-scale Pedestrian Intention Estimation (PIE) dataset. Using data generalisation techniques, we show that the proposed technique outperformed the state-of-the-art techniques by up to 7%, reaching up to 94% accuracy while maintaining a comparable run-time of 6.1 ms.

Original language	English
Article number	120077
Number of pages	11
Journal	Expert Systems with Applications
Volume	225
Early online date	13 Apr 2023
DOIs	https://doi.org/10.1016/j.eswa.2023.120077
Publication status	Published - 1 Sept 2023

Bibliographical note

Keywords

Intent prediction
LSTM
Pedestrian detection
Pose estimation
Tracking

Access to Document

10.1016/j.eswa.2023.120077Licence: CC BY 4.0

Ahmedetal 2023 VoR
Copyright © 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Final published version, 1.74 MBLicence: CC BY 4.0

Cite this

@article{6ccde7f784a54a63941d40d12cb8c25d,

title = "Multi-scale pedestrian intent prediction using 3D joint information as spatio-temporal representation",

abstract = "There has been a rise of use of Autonomous Vehicles on public roads. With the predicted rise of road traffic accidents over the coming years, these vehicles must be capable of safely operate in the public domain. The field of pedestrian detection has significantly advanced in the last decade, providing high-level accuracy, with some technique reaching near-human level accuracy. However, there remains further work required for pedestrian intent prediction to reach human-level performance. One of the challenges facing current pedestrian intent predictors are the varying scales of pedestrians, particularly smaller pedestrians. This is because smaller pedestrians can blend into the background, making them difficult to detect, track or apply pose estimations techniques. Therefore, in this work, we present a novel intent prediction approach for multi-scale pedestrians using 2D pose estimation and a Long Short-term memory (LSTM) architecture. The pose estimator predicts keypoints for the pedestrian along the video frames. Based on the accumulation of these keypoints along the frames, spatio-temporal data is generated. This spatio-temporal data is fed to the LSTM for classifying the crossing behaviour of the pedestrians. We evaluate the performance of the proposed techniques on the popular Joint Attention in Autonomous Driving (JAAD) dataset and the new larger-scale Pedestrian Intention Estimation (PIE) dataset. Using data generalisation techniques, we show that the proposed technique outperformed the state-of-the-art techniques by up to 7%, reaching up to 94% accuracy while maintaining a comparable run-time of 6.1 ms.",

keywords = "Intent prediction, LSTM, Pedestrian detection, Pose estimation, Tracking",

author = "Sarfraz Ahmed and Ammar Al-Bazi and Chitta Saha and Sujan Rajbhandari and Huda, {M. Nazmul}",

note = "Copyright {\textcopyright} 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).",

year = "2023",

month = sep,

day = "1",

doi = "10.1016/j.eswa.2023.120077",

language = "English",

volume = "225",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier",

}

TY - JOUR

T1 - Multi-scale pedestrian intent prediction using 3D joint information as spatio-temporal representation

AU - Ahmed, Sarfraz

AU - Al-Bazi, Ammar

AU - Saha, Chitta

AU - Rajbhandari, Sujan

AU - Huda, M. Nazmul

PY - 2023/9/1

Y1 - 2023/9/1

N2 - There has been a rise of use of Autonomous Vehicles on public roads. With the predicted rise of road traffic accidents over the coming years, these vehicles must be capable of safely operate in the public domain. The field of pedestrian detection has significantly advanced in the last decade, providing high-level accuracy, with some technique reaching near-human level accuracy. However, there remains further work required for pedestrian intent prediction to reach human-level performance. One of the challenges facing current pedestrian intent predictors are the varying scales of pedestrians, particularly smaller pedestrians. This is because smaller pedestrians can blend into the background, making them difficult to detect, track or apply pose estimations techniques. Therefore, in this work, we present a novel intent prediction approach for multi-scale pedestrians using 2D pose estimation and a Long Short-term memory (LSTM) architecture. The pose estimator predicts keypoints for the pedestrian along the video frames. Based on the accumulation of these keypoints along the frames, spatio-temporal data is generated. This spatio-temporal data is fed to the LSTM for classifying the crossing behaviour of the pedestrians. We evaluate the performance of the proposed techniques on the popular Joint Attention in Autonomous Driving (JAAD) dataset and the new larger-scale Pedestrian Intention Estimation (PIE) dataset. Using data generalisation techniques, we show that the proposed technique outperformed the state-of-the-art techniques by up to 7%, reaching up to 94% accuracy while maintaining a comparable run-time of 6.1 ms.

AB - There has been a rise of use of Autonomous Vehicles on public roads. With the predicted rise of road traffic accidents over the coming years, these vehicles must be capable of safely operate in the public domain. The field of pedestrian detection has significantly advanced in the last decade, providing high-level accuracy, with some technique reaching near-human level accuracy. However, there remains further work required for pedestrian intent prediction to reach human-level performance. One of the challenges facing current pedestrian intent predictors are the varying scales of pedestrians, particularly smaller pedestrians. This is because smaller pedestrians can blend into the background, making them difficult to detect, track or apply pose estimations techniques. Therefore, in this work, we present a novel intent prediction approach for multi-scale pedestrians using 2D pose estimation and a Long Short-term memory (LSTM) architecture. The pose estimator predicts keypoints for the pedestrian along the video frames. Based on the accumulation of these keypoints along the frames, spatio-temporal data is generated. This spatio-temporal data is fed to the LSTM for classifying the crossing behaviour of the pedestrians. We evaluate the performance of the proposed techniques on the popular Joint Attention in Autonomous Driving (JAAD) dataset and the new larger-scale Pedestrian Intention Estimation (PIE) dataset. Using data generalisation techniques, we show that the proposed technique outperformed the state-of-the-art techniques by up to 7%, reaching up to 94% accuracy while maintaining a comparable run-time of 6.1 ms.

KW - Intent prediction

KW - LSTM

KW - Pedestrian detection

KW - Pose estimation

KW - Tracking

UR - http://www.scopus.com/inward/record.url?scp=85153085419&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2023.120077

DO - 10.1016/j.eswa.2023.120077

M3 - Article

SN - 0957-4174

VL - 225

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 120077

ER -

Multi-scale pedestrian intent prediction using 3D joint information as spatio-temporal representation

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this