Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation

Matthew O’Kelly; Sinha, Aman; Namkoong, Hongseok; Duchi, John C.; Tedrake, Russ · 2018 · OpenAlex-citations

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the critical challenge of rigorously and scalably testing autonomous vehicles (AVs), particularly those employing deep-learning perception and control systems. Real-world testing is prohibitively expensive and dangerous, requiring billions of miles to statistically validate safety due to the rarity of accidents. Conversely, formal verification is often intractable for complex, black-box neural networks and struggles with subjective fault assignment. To bridge this gap, the authors propose a risk-based framework that estimates the probability of an accident under a base distribution of standard traffic behavior, treating the AV policy as a black box. The methodology employs a photo-realistic, physics-based simulation framework capable of parallelized, faster-than-real-time evaluations. The authors first learn a generative model for human-like driving behaviors ($P_0$) using imitation learning, specifically an ensemble of Generative Adversarial Imitation Learning (GAIL) models trained on the NGSim highway traffic dataset. This ensemble characterizes a distribution of human policies rather than a single deterministic one. To efficiently estimate the rare-event probability of accidents ($p_\gamma$), the authors apply adaptive importance sampling using the cross-entropy method. This algorithm iteratively approximates an optimal importance sampling distribution ($P_\theta$) that generates dangerous scenarios more frequently than the base distribution. The implementation handles high-dimensional search spaces by solving convex optimization problems in each iteration and computing likelihoods in logarithmic scale to ensure numerical stability. Experiments were conducted on a multi-agent highway scenario using Unreal Engine 4, evaluating both a non-vision policy (using lidar) and a vision-based end-to-end deep learning policy. The results demonstrate that the cross-entropy method accelerates the assessment of rare-event probabilities by 2–20 times compared to naive Monte Carlo sampling, producing significantly more rare events and lower variance estimators. Furthermore, the distributed architecture allows simulation rollouts to be up to 30 times faster than real time per processor, resulting in a total speedup of 10–300 times over real-world testing. Qualitative analysis revealed that the learned importance sampling distributions effectively identified adversarial conditions, such as environmental vehicles boxing in the ego-vehicle or increasing trailing speeds, which led to frequent sideswiping and other dangerous interactions. The significance of this work lies in providing a scalable, rigorous platform for evaluating AV systems without requiring white-box access to the vehicle’s algorithms. By focusing on probabilistic safety metrics rather than binary correctness, the framework avoids the logical inconsistencies of formal verification while offering a safer and faster alternative to real-world testing. The ability to rank dangerous scenarios by their likelihood under standard traffic conditions also aids engineers in understanding failure modes and prioritizing improvements to AV policies. This approach represents a step toward reliably deploying deep-learning systems in safety-critical applications.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	OpenAlex-citations	—	—	1	2026-06-18
archive	success	openalex	—	—	5	2026-06-25
extract	success	cached	—	—	2	2026-06-26
clean	success	clean	—	—	1	2026-06-18
chunk	success	chunk	—	—	1	2026-06-18
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-18
promote	success	—	—	—	1	2026-06-18
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-26
tag	success	vector_similarity	—	—	6	2026-06-18
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

dms validation

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Methodological Resource: tool software
Theoretical Contribution: computational model