Uncovering variability in human driving behavior through automatic extraction of similar traffic scenes from large naturalistic datasets

Abbink, David A. · 2022 · arXiv (Cornell University)

archive: archived pipeline: cataloged verified

Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of analyzing variability in human driving behavior across both operational (execution of maneuvers) and tactical (choice of maneuver) levels. While naturalistic datasets like highD, NGSIM, and pNEUMA are widely used for autonomous vehicle validation and driver modeling, existing methods typically extract traffic scenarios that implicitly select for specific tactical behaviors, thereby ignoring tactical variability. The authors argue that understanding how different drivers respond to the same initial traffic scene is crucial for developing realistic driver models and acceptable autonomous vehicle behaviors. To this end, they propose a novel method to automatically extract similar traffic scenes from large datasets, distinguishing the initial "traffic context" from the subsequent driver response. The proposed method consists of four steps. First, a user manually selects an example traffic scene of interest. Second, the traffic context—defined as the positions and velocities of surrounding vehicles relative to the ego vehicle—is converted into a mathematical set of points. To account for the differing significance of longitudinal and lateral positions in highway driving, a scaling parameter $\lambda$ is applied to lateral dimensions. Third, the Hausdorff distance metric is used to calculate the similarity between this context set and all other potential scenes in the dataset. To manage computational load, the authors filter candidates by lane position. Finally, the $N$ scenes with the shortest Hausdorff distances are selected. The method was validated in a case study using the highD dataset, which contains naturalistic vehicle trajectories recorded on German highways. The authors selected a specific car-following scenario where the ego vehicle could either maintain its lane or overtake. The algorithm successfully extracted 250 scenes with traffic contexts similar to the example. Analysis of the extracted scenes revealed significant variability in human responses: 10% of drivers changed lanes, 25% slowed down, and 65% maintained their speed and lane. The results demonstrated that the method captures both tactical variability (lane change vs. car following) and operational variability (specific velocity and position profiles). The authors noted that the method is robust to variations in the scaling parameter $\lambda$ and computationally feasible, taking approximately 3–5 hours on standard hardware. The significance of this work lies in providing an automated, repeatable tool for investigating multi-level driving variability without the high costs and time requirements of driving-simulator experiments. By isolating the initial traffic context, the method allows researchers to study the full range of human responses to identical conditions. This capability supports the development of more comprehensive driver models and the validation of autonomous vehicles that must account for diverse human behaviors. The authors acknowledge limitations, such as the lack of a human-centric similarity metric and the exclusion of vehicle dimensions, but provide open-source code to facilitate further research and application to other datasets.

Key finding

The proposed method successfully extracts similar traffic scenes from naturalistic data, revealing that human drivers exhibit significant variability in both tactical maneuvers and operational execution when responding to comparable traffic contexts.

Methodology

dataset

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via author_sweep_intake on 2026-05-27.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	author_sweep	—	—	2	2026-05-27
archive	success	canonical_url	—	—	6	2026-06-06
extract	success	cached	—	—	3	2026-06-10
clean	success	clean	—	—	1	2026-06-07
chunk	success	chunk	—	—	1	2026-06-07
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-07
enrich	skipped	—	—	—	4	2026-07-02
promote	success	—	—	—	1	2026-06-04
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	2	2026-06-10
tag	success	vector_similarity	—	—	15	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Methodological Resource: dataset resource, tool software
Theoretical Contribution: computational model