Uncovering Variability in Human Driving Behavior Through Automatic Extraction of Similar Traffic Scenes from Large Naturalistic Datasets

Abbink, David A. · 2023 · unknown

archive: archived pipeline: cataloged verified

Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of analyzing variability in human driving behavior across both operational (maneuver execution) and tactical (maneuver choice) levels. While naturalistic datasets like highD, NGSIM, and pNEUMA are widely used for modeling driver behavior and validating autonomous vehicles, existing methods typically extract traffic scenarios that implicitly select for specific tactical behaviors (e.g., lane changes). This approach conflates the initial traffic scene with the driver’s response, obscuring tactical variability. The authors propose a method to automatically extract similar traffic scenes—snapshots of the environment excluding the ego vehicle’s state—from large datasets, enabling the study of how different drivers respond to identical initial conditions. The proposed method consists of four steps. First, a user manually selects an example traffic scene of interest. Second, the "traffic context" (positions and velocities of surrounding vehicles relative to the ego vehicle) is converted into a mathematical set of 4-dimensional points. To account for the differing significance of lateral and longitudinal positions on highways, a scaling parameter $\lambda$ is applied to lateral dimensions. Third, the Hausdorff distance is calculated between this context set and all other potential scenes in the dataset to measure similarity. To manage computational load, the authors filter candidates by lane and frame downsampling. Fourth, the $N$ scenes with the shortest Hausdorff distances are selected. The authors implemented this method as an extension to the TraViA visualization software and validated it using the highD dataset. In a case study, the authors extracted 250 scenes similar to a manually selected car-following scenario involving three surrounding vehicles. The method successfully identified scenes with comparable traffic contexts, with 233 of the 250 results containing the same number of surrounding vehicles. Analysis of the subsequent driver trajectories revealed significant variability in both tactical and operational behavior. Specifically, 10% of drivers changed lanes, 25% slowed down, and 65% maintained their lane and speed within a three-second window. The results demonstrated that the method can capture diverse human responses to similar initial conditions, providing data distributions useful for validating driver models. The significance of this work lies in providing an automated, repeatable tool for extracting comparable traffic scenes without the cost and time associated with driving-simulator experiments. By separating the traffic context from the ego vehicle’s response, the method allows researchers to investigate tactical variability, which is crucial for developing autonomous vehicles that exhibit human-like behavior and for validating driver models that account for multiple possible maneuvers. The authors note limitations, including the lack of a human-centric similarity metric and the exclusion of vehicle dimensions, but conclude that the method is robust and applicable to other trajectory datasets.

Key finding

The proposed method successfully extracts similar traffic scenes from naturalistic data, revealing that human drivers exhibit diverse tactical and operational responses to identical traffic contexts.

Methodology

dataset

Sample size: 250

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via author_sweep_intake on 2026-05-27.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	author_sweep	—	—	2	2026-05-27
archive	success	canonical_url	—	—	7	2026-06-06
extract	success	cached	—	—	3	2026-06-10
clean	success	clean	—	—	1	2026-06-07
chunk	success	chunk	—	—	1	2026-06-07
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-07
enrich	skipped	—	—	—	5	2026-07-02
promote	success	—	—	—	1	2026-06-04
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	2	2026-06-10
tag	success	vector_similarity	—	—	15	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Methodological Resource: dataset resource, tool software
Theoretical Contribution: computational model