STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction
DOI: 10.1109/cvpr42600.2020.01136
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper introduces STINet (Spatio-Temporal-Interactive Network), an end-to-end two-stage neural network designed for joint pedestrian detection and trajectory prediction from LiDAR point cloud sequences. The authors address the limitations of traditional modular pipelines, which separate detection, tracking, and prediction, thereby losing critical geometric and temporal information. They also critique existing end-to-end methods that rely on single-stage detectors and fail to explicitly model object-level temporal dynamics or interactions between pedestrians. STINet aims to capture comprehensive spatio-temporal representations and relational reasoning to improve both detection accuracy and trajectory prediction. The proposed architecture consists of three main components: a backbone network, a Temporal Region Proposal Network (T-RPN), and a Proposal Prediction Network. The backbone processes a sequence of past and current point clouds using Pillar Feature Encoding and a ResUNet to generate feature maps. The T-RPN generates temporal proposals that include bounding boxes for both the current frame and past frames, allowing the network to link objects across time without explicit tracking. The Proposal Prediction Network extracts spatio-temporal-interactive features for each proposal. This involves extracting local geometry features, local dynamic features (using a "meta box" covering the object's movement history), and history path features (displacement vectors). Crucially, the network employs a graph-based interaction layer to model relationships between neighboring pedestrians, aggregating information from surrounding objects to inform trajectory predictions. Experiments were conducted on the Lyft Dataset and the Waymo Open Dataset. On the Waymo Open Dataset, STINet achieved a bird-eyes-view detection Average Precision (AP) of 80.73% and an Average Displacement Error (ADE) of 33.67 cm for pedestrian trajectory prediction, establishing state-of-the-art performance for both tasks. The model demonstrated real-time inference capabilities, requiring only 74.6 ms for inference on a 100m x 100m range. The results validate that explicitly modeling temporal proposals and inter-object interactions significantly enhances performance compared to methods that treat detection and prediction separately or use coarse temporal fusion. The significance of this work lies in its demonstration that joint optimization of detection and trajectory prediction, combined with explicit modeling of temporal dynamics and social interactions, yields superior results for autonomous driving applications. By integrating these elements into a unified two-stage framework, STINet provides a more robust solution for understanding pedestrian behavior, which is critical for safe and smooth autonomous vehicle navigation. The paper highlights the importance of capturing fine-grained temporal information and relational context, offering a new direction for end-to-end perception systems in complex environments.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | Crossref | — | — | 1 | 2026-06-25 |
| archive | success | semantic_scholar | — | — | 6 | 2026-06-26 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-26 |
| chunk | success | chunk | — | — | 1 | 2026-06-26 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-26 |
| enrich | success | openalex | — | — | 1 | 2026-06-26 |
| promote | success | — | — | — | 1 | 2026-06-25 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-26 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.