Adaptive Stress Testing with Reward Augmentation for Autonomous Vehicle Validatio

Corso, Anthony; Du, Peter; Driggs-Campbell, Katherine; Kochenderfer, Mykel J. · 2019 · Unknown

DOI: 10.1109/itsc.2019.8917242

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the limitations of Adaptive Stress Testing (AST), a simulation-based validation method for autonomous vehicles (AVs) that uses reinforcement learning to identify failure scenarios. Standard AST often discovers trivial failures where the AV is not at fault (e.g., a pedestrian running into a stopped car) and repeatedly finds similar failure modes, providing little insight into actual system weaknesses. To resolve this, the authors propose augmenting the AST reward function with domain-specific heuristics to guide the search toward relevant, diverse, and informative failure cases. The methodology integrates two specific reward modifications into the AST framework. First, the authors incorporate Responsibility-Sensitive Safety (RSS), a formal model of driving rules, to classify AV behavior as proper or improper. The reward function is modified to prioritize trajectories where the AV behaves improperly prior to a collision, thereby filtering out scenarios where the AV is blameless. Second, a trajectory dissimilarity metric is introduced to encourage the discovery of distinct failure modes. This metric calculates the distance between new failure trajectories and previously found ones, rewarding the solver for exploring diverse regions of the failure space. Experiments were conducted using a crosswalk scenario involving an AV controlled by an Intelligent Driver Model and pedestrians, as well as a more complex scenario with two vehicles and two pedestrians. In the single-pedestrian experiment, standard AST predominantly found scenarios where the AV was not at fault. In contrast, the RSS-augmented AST identified trajectories where the AV behaved improperly in over 25% of timesteps, revealing specific policy failures such as inadequate braking due to sensor noise. In the two-vehicle experiment, standard AST converged on a single failure mode where pedestrians caused collisions with a stopped vehicle. The dissimilarity-augmented AST, however, discovered multiple unique failure types, including vehicle-induced pedestrian collisions and vehicle-to-vehicle collisions caused by following too closely. The significance of this work lies in its ability to make simulation-based validation more efficient and meaningful. By guiding the search algorithm with RSS and dissimilarity metrics, the enhanced AST framework identifies a wider and more expressive subset of the failure space. This allows engineers to detect actual weaknesses in AV policies rather than trivial or unavoidable accidents, thereby improving the safety validation process for autonomous systems.

Key finding

Augmenting the Adaptive Stress Testing reward function with Responsibility-Sensitive Safety rules and a trajectory dissimilarity metric enables the discovery of a more diverse and relevant set of autonomous vehicle failure scenarios compared to the baseline method.

Methodology

simulation_modeling

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via author_sweep_intake on 2026-05-28.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	author_sweep	—	—	2	2026-05-28
archive	success	canonical_url	—	—	7	2026-06-06
extract	success	cached	—	—	3	2026-06-10
clean	success	clean	—	—	1	2026-06-04
chunk	success	chunk	—	—	1	2026-06-04
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-04
enrich	success	—	—	—	1	2026-05-28
promote	success	—	—	—	1	2026-06-04
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	2	2026-06-10
tag	success	vector_similarity	—	—	15	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

automation surprise

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Theoretical Contribution: computational model