Modeling Human-Like Car-Following Behavior: A GRPO Approach for Region-Specific Adaptation

Liu, Yang; He, Dengbo · 2026 · Journal of Physics: Conference Series

DOI: 10.1088/1742-6596/3211/1/012006

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This study addresses the challenge of modeling human-like car-following (CF) behavior for autonomous vehicles (AVs) to improve safety and efficiency in mixed traffic environments. As AVs become more common, their behavioral similarity to human drivers is crucial for predictability and acceptance. The authors propose using Generative Reinforcement Proximal Optimization (GRPO), a novel imitation learning approach, to capture nuanced human decision-making patterns. This method is motivated by the limitations of traditional imitation learning, which often suffers from demonstration bias and distribution shifts when encountering unseen states. GRPO optimizes policies by comparing samples against a group baseline, thereby reducing bias and improving generalization. The research formulates car-following as a Markov Decision Process (MDP) where the agent (ego-vehicle) controls acceleration and deceleration based on states including relative distance, ego-velocity, lead-vehicle velocity, and relative speed. The GRPO algorithm was trained using a reward function that minimized the discrepancy between generated and real-world trajectories while penalizing collisions. To evaluate performance, the authors compared GRPO against four baseline models: the Intelligent Driver Model (IDM), Proximal Policy Optimization (PPO), Soft Actor-Critic (SAC), and Generative Adversarial Imitation Learning (GAIL). The evaluation utilized two distinct datasets to test region-specific adaptation: the NGSIM dataset from the United States and a self-collected CN-Truck dataset from Chinese highways. Car-following segments were extracted based on specific criteria, resulting in 999 segments from NGSIM and 354 from the CN-Truck dataset. Results indicate that the GRPO model converged after approximately 100 training epochs and outperformed all baseline models in replicating human-like spacing and velocity profiles. Quantitative analysis showed that GRPO achieved the lowest Mean Squared Error (MSE) for both spacing and velocity across both datasets. For instance, on the NGSIM test set, GRPO achieved an MSE of 25.73 for spacing and 1.02 for velocity, compared to higher errors for other models. However, similar to other reinforcement learning approaches, GRPO exhibited relatively high jerk values, indicating less smooth acceleration changes compared to the IDM model. No significant differences were observed in Time-To-Collision (TTC) metrics among the models, as the focus was on behavior imitation rather than safety optimization. The study concludes that GRPO is highly effective for generating human-like car-following behaviors and demonstrates feasibility for region-specific adaptation in mixed traffic. The findings highlight substantial differences in driving styles between the US and Chinese datasets, underscoring the importance of diverse data for model training. While GRPO excels in trajectory imitation, the authors note that future work should incorporate jerk and TTC metrics into the reward function to develop smoother and safer driving algorithms. This research represents the first application of the GRPO framework to car-following modeling, offering a robust method for enhancing AV behavior in complex traffic scenarios.

Key finding

The GRPO model demonstrated superior performance in replicating human-like car-following spacing and velocity trajectories compared to baseline models, though it produced less smooth acceleration profiles.

Methodology

modeling

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	—	—	—	1	2026-05-28
archive	success	canonical_url	—	—	1	2026-06-06
extract	success	cached	—	—	3	2026-06-10
clean	success	clean	—	—	1	2026-06-04
chunk	success	chunk	—	—	1	2026-06-04
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-04
enrich	skipped	—	—	—	3	2026-06-04
promote	success	—	—	—	1	2026-06-04
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	2	2026-06-10
tag	success	vector_similarity	—	—	15	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

following distance

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Theoretical Contribution: computational model