Modeling Human Driver Behaviors When Following Autonomous Vehicles: An Inverse Reinforcement Learning Approach

Wen, Xiao; Jian, Sisi; He, Dengbo · 2022 · IEEE ITSC

DOI: 10.1109/itsc55140.2022.9922310

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of modeling human driver behaviors when following autonomous vehicles (AVs) during the transition period of mixed traffic. The authors argue that existing studies relying on simplified simulations or limited field experiments fail to capture the complexity of human-AV interactions, which are critical for designing safe and efficient AV controllers. To overcome these limitations, the study utilizes high-resolution real-world data to realistically model and understand the microscopic dynamics of human-driven vehicles (HVs) following AVs. The methodology employs an inverse reinforcement learning (IRL) approach, specifically Inverse Soft-Q Learning (IQ-Learn), to retrieve the reward functions underlying human driver decisions. Unlike adversarial IRL methods like GAIL or AIRL, which suffer from sensitivity to hyperparameters due to min-max training games, IQ-Learn estimates a single Q-function representing both reward and policy, converting the problem into a simpler minimization task. The optimal policy for HVs is then estimated using a deep reinforcement learning algorithm, Soft Actor-Critic (SAC). The model is trained and validated using 264 HV-following-AV events extracted from the Waymo Open Dataset, which provides 10-Hz sensor data. The study compares the proposed model against four baselines: a physics-based Intelligent Driver Model (IDM), a data-driven Long Short-Term Memory (LSTM) network, and two adversarial IRL methods (GAIL and AIRL). Driver heterogeneity is further analyzed by clustering longitudinal maneuvering styles using principal component analysis and hierarchical clustering. The results demonstrate that the proposed IQ-Learn model significantly outperforms conventional physics-based models, data-driven models, and previous IRL methods in terms of trajectory prediction accuracy, particularly over longer time horizons. By recovering the reward functions, the model identifies the preferred states of human drivers—such as specific speeds, spacing, and relative speeds—when following AVs. This allows for a deeper understanding of how humans adapt their driving behavior in the presence of autonomous vehicles, revealing patterns that simpler models miss. The significance of this work lies in its ability to provide actionable insights for AV technology firms. By accurately inferring human drivers’ reward functions and behavioral adaptations, the model enables AV controllers to better predict and react to surrounding HVs, thereby improving road safety and traffic efficiency. The study highlights the potential of using large-scale, real-world datasets combined with advanced IRL techniques to move beyond simplified simulations, offering a more robust framework for understanding mixed traffic interactions.

Key finding

The proposed inverse soft-Q learning model significantly outperforms conventional and data-driven car-following models in predicting human driver trajectories when following autonomous vehicles.

Methodology

modeling

Sample size: 264

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	—	—	—	1	2026-05-28
archive	success	canonical_url	—	—	1	2026-06-06
extract	success	cached	—	—	3	2026-06-10
clean	success	clean	—	—	1	2026-06-07
chunk	success	chunk	—	—	1	2026-06-07
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-07
enrich	success	semantic_scholar	—	—	4	2026-06-15
promote	success	—	—	—	1	2026-06-04
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	2	2026-06-10
tag	success	vector_similarity	—	—	15	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

situational awareness

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Methodological Resource: tool software
Theoretical Contribution: computational model