Modeling Human Driver Behaviors When Following Autonomous Vehicles: An Inverse Reinforcement Learning Approach
DOI: 10.1109/itsc55140.2022.9922310
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the challenge of modeling human driver behaviors when following autonomous vehicles (AVs) during the transition period of mixed traffic. The authors argue that existing studies relying on simplified simulations or limited field experiments fail to capture the complexity of human-AV interactions, which are critical for designing safe and efficient AV controllers. To overcome these limitations, the study utilizes high-resolution real-world data to realistically model and understand the microscopic dynamics of human-driven vehicles (HVs) following AVs. The methodology employs an inverse reinforcement learning (IRL) approach, specifically Inverse Soft-Q Learning (IQ-Learn), to retrieve the reward functions underlying human driver decisions. Unlike adversarial IRL methods like GAIL or AIRL, which suffer from sensitivity to hyperparameters due to min-max training games, IQ-Learn estimates a single Q-function representing both reward and policy, converting the problem into a simpler minimization task. The optimal policy for HVs is then estimated using a deep reinforcement learning algorithm, Soft Actor-Critic (SAC). The model is trained and validated using 264 HV-following-AV events extracted from the Waymo Open Dataset, which provides 10-Hz sensor data. The study compares the proposed model against four baselines: a physics-based Intelligent Driver Model (IDM), a data-driven Long Short-Term Memory (LSTM) network, and two adversarial IRL methods (GAIL and AIRL). Driver heterogeneity is further analyzed by clustering longitudinal maneuvering styles using principal component analysis and hierarchical clustering. The results demonstrate that the proposed IQ-Learn model significantly outperforms conventional physics-based models, data-driven models, and previous IRL methods in terms of trajectory prediction accuracy, particularly over longer time horizons. By recovering the reward functions, the model identifies the preferred states of human drivers—such as specific speeds, spacing, and relative speeds—when following AVs. This allows for a deeper understanding of how humans adapt their driving behavior in the presence of autonomous vehicles, revealing patterns that simpler models miss. The significance of this work lies in its ability to provide actionable insights for AV technology firms. By accurately inferring human drivers’ reward functions and behavioral adaptations, the model enables AV controllers to better predict and react to surrounding HVs, thereby improving road safety and traffic efficiency. The study highlights the potential of using large-scale, real-world datasets combined with advanced IRL techniques to move beyond simplified simulations, offering a more robust framework for understanding mixed traffic interactions.
Key finding
The proposed inverse soft-Q learning model significantly outperforms conventional and data-driven car-following models in predicting human driver trajectories when following autonomous vehicles.
Methodology
modeling
Sample size: 264
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | — | — | — | 1 | 2026-05-28 |
| archive | success | canonical_url | — | — | 1 | 2026-06-06 |
| extract | success | cached | — | — | 3 | 2026-06-10 |
| clean | success | clean | — | — | 1 | 2026-06-07 |
| chunk | success | chunk | — | — | 1 | 2026-06-07 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-07 |
| enrich | success | semantic_scholar | — | — | 4 | 2026-06-15 |
| promote | success | — | — | — | 1 | 2026-06-04 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 2 | 2026-06-10 |
| tag | success | vector_similarity | — | — | 15 | 2026-06-11 |
| verify | success | — | — | — | 2 | 2026-06-10 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Methodological Resource: tool software
- Theoretical Contribution: computational model