SG-LSTM: Social Group LSTM for Robot Navigation Through Dense Crowds

Bhaskara, Rashmi; Chiu, Maurice; Bera, Aniket · 2023 · OpenAlex-citations

DOI: 10.1109/iros55552.2023.10341954

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of socially compliant robot navigation in dense crowds, where existing algorithms often treat pedestrians as individual obstacles, leading to inefficient paths or the disruption of social groups. The authors propose the Social Group Long Short-term Memory (SG-LSTM) model, designed to predict pedestrian trajectories by leveraging group dynamics rather than modeling individuals in isolation. This approach aims to improve prediction accuracy, reduce computational load, and enable robots to navigate without breaking up pedestrian groups, thereby adhering to unwritten social norms. The methodology employs a hierarchical architecture that decouples pedestrians into groups and individuals. First, a CNN-based group learning algorithm detects perceptual groups in RGB and depth frames, handling occlusions and varying scales. Spatial coordinates for these groups and ungrouped pedestrians are calculated using depth maps and camera field-of-view parameters. These coordinates are fed into the SG-LSTM model, which uses a social pooling layer to capture interactions between groups. By treating each group as a single entity, the model significantly reduces the number of LSTM units required compared to individual-based models. The predicted trajectories are then used by a Generalized Velocity Obstacles (GVO) navigation system to compute collision-free paths for a robot with car-like kinematic constraints. The system was evaluated on the ETH, Hotel, MOT15, and a new proprietary dataset containing over 30,000 labeled frames of pedestrian groups. Experimental results demonstrate that SG-LSTM outperforms baseline methods, including Linear, Vanilla LSTM, O-LSTM, and S-LSTM, in trajectory prediction accuracy. On the ETH dataset, SG-LSTM achieved an Average Displacement Error of 0.35 and a Final Displacement Error of 0.68, compared to 0.50 and 1.07 for S-LSTM, respectively. Similar improvements were observed on the MOT15 and proprietary datasets. Crucially, the group-optimized approach reduced average runtime by over 50% compared to S-LSTM, requiring only 45ms versus 101ms on densely crowded scenes. This efficiency makes the model more suitable for deployment on resource-constrained edge devices. The significance of this work lies in its contribution to socially aware robotics, offering a more efficient and accurate method for navigating complex human environments. By releasing a substantial annotated dataset and demonstrating superior performance in both accuracy and speed, the authors provide a robust foundation for future research in crowd-aware navigation. The study highlights that modeling group cohesion not only respects social norms but also enhances computational efficiency, enabling smoother and more natural robot interactions in public spaces like airports and hospitals.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

StageOutcomeToolModelPromptAttemptsCompleted
discover success OpenAlex-citations 1 2026-06-25
archive success semantic_scholar 6 2026-06-26
extract success cached 2 2026-06-26
clean success clean 1 2026-06-25
chunk success chunk 1 2026-06-25
embed success embed Qwen/Qwen3-Embedding-8B 1 2026-06-25
promote success 1 2026-06-25
summarize success llm qwen3.6-27b-prismaquant summ-v5 1 2026-06-26
tag success vector_similarity 6 2026-06-25
verify success 1 2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.