PedAST-GCN: Fast Pedestrian Crossing Intention Prediction Using Spatial–Temporal Attention Graph Convolution Networks

Ling, Yancheng; Ma, Zhenliang; Zhang, Qi; Xie, Bangquan; Weng, Xiaoxiong · 2024 · Crossref

DOI: 10.1109/tits.2024.3398252

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of accurately and efficiently predicting pedestrian crossing intentions for autonomous vehicles (AVs). While existing models using Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) achieve high accuracy, they suffer from high computational complexity and sensitivity to image quality, hindering real-time application. Graph Convolutional Networks (GCNs) offer faster inference but often rely on limited data types, such as pose keypoints, missing critical interaction information with vehicles. To bridge this gap, the authors propose PedAST-GCN, a lightweight Spatial-Temporal Attention Graph Convolution Network designed for fast, robust prediction using multimodal inputs. The methodology employs a GCN backbone that processes three distinct modality features: pedestrian pose keypoints, bounding boxes, and ego-vehicle speeds. The authors introduce novel graph representations for bounding boxes and vehicle speeds to preserve spatial structure and capture interaction dynamics, respectively. The model architecture consists of a Modality layer, a Backbone layer with three parallel Spatial-Temporal Attention GCN (STA-GCN) units, a Fusion layer, and a Prediction layer. The STA-GCN units utilize temporal attention to capture long-term dependencies and channel attention (for pose data) to reinforce discriminative features. The Fusion layer employs a modality attention mechanism to dynamically weight and fuse heterogeneous features from the three streams, followed by global average pooling and fully connected layers for binary classification. The model was validated on two large-scale public datasets, JAAD and PIE, and compared against state-of-the-art CNN, RNN, and GCN models. Results demonstrate that PedAST-GCN achieves superior performance in terms of both prediction accuracy and computation time. Ablation studies confirm the effectiveness of the proposed components, specifically the lightweight GCN backbone, the novel graph designs for bounding boxes and vehicle speeds, and the attention mechanisms for capturing spatial-temporal dependencies and fusing modalities. The model also exhibits robustness across various observation lengths and in the presence of noisy data, outperforming models that rely heavily on complex preprocessing like segmentation maps. The significance of this work lies in providing a computationally efficient solution for real-time pedestrian intention prediction suitable for onboard AV systems with limited resources. By leveraging simple but robust graph representations and attention mechanisms, PedAST-GCN reduces the reliance on high-quality image data and intensive preprocessing. This approach enhances the safety of autonomous driving by enabling timely and accurate predictions of pedestrian behavior, addressing a critical factor in vehicle-pedestrian collisions. The study highlights the potential of multimodal GCNs with attention mechanisms as a viable alternative to heavier CNN and RNN architectures for sequential prediction tasks in intelligent transportation systems.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	Crossref	—	—	1	2026-06-25
archive	success	unpaywall	—	—	2	2026-06-26
extract	success	cached	—	—	2	2026-06-26
clean	success	clean	—	—	1	2026-06-26
chunk	success	chunk	—	—	1	2026-06-26
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-26
enrich	success	openalex	—	—	1	2026-06-26
promote	success	—	—	—	1	2026-06-25
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-26
tag	success	vector_similarity	—	—	6	2026-06-26
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

pedestrian behavior perception