Learning from Risk: LLM-Guided Generation of Safety-Critical Scenarios with Prior Knowledge

Wang, Yuhang; Huang, Heye; Xu, Zhenhua; Sun, Kailai; Baoshen, Guo,; Zhao, Jin-Hua · 2025 · ArXiv.org

DOI: 10.48550/arxiv.2511.20726

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the critical challenge of validating autonomous driving systems against rare, long-tail events and complex multi-agent interactions, which are scarce in real-world datasets. Existing scenario generation methods face significant limitations: rule-based approaches lack behavioral diversity, data-driven methods cannot synthesize unseen risks, and recent generative models struggle to balance realism, controllability, and scalability. To bridge this gap, the authors propose a high-fidelity framework that integrates a Conditional Variational Autoencoder (CVAE) with a Large Language Model (LLM) to generate safety-critical scenarios that are both physically consistent and risk-sensitive. The methodology employs a dual-module architecture. First, a CVAE-GNN module learns latent spatiotemporal representations from large-scale naturalistic datasets (nuScenes and highD). The encoder uses graph neural networks to capture multi-agent dependencies and lane topology, producing a latent variable that represents stochastic traffic behavior. The decoder reconstructs physically coherent base scenarios from this latent space. Second, an LLM-guided optimization module acts as an adversarial reasoning engine. The LLM parses structured scene descriptions—including trajectories, map context, and relational cues—using chain-of-thought reasoning to identify event types and potential risks. It then dynamically generates adaptive loss weights for specific metrics, such as Time-to-Collision (TTC), minimum lateral distance, and yaw rate. These weights guide the optimization of the CVAE’s latent space, ensuring the generated scenarios align with desired risk levels while maintaining physical plausibility. Experiments conducted in CARLA and SMARTS simulators demonstrate that the framework substantially increases the coverage of high-risk and long-tail events compared to existing rule- or data-driven methods. The approach improves the consistency between simulated and real-world traffic distributions and exposes autonomous driving systems to interactions that are significantly more challenging. By modulating the learned latent distribution, the framework systematically covers low-, medium-, and high-risk regimes, effectively bridging the sim-to-real gap. The significance of this work lies in establishing a new pathway for principled stress-testing of autonomous systems. By combining data-driven motion priors with knowledge-driven semantic reasoning, the framework enables interpretable and controllable generation of diverse safety-critical scenarios. This allows for rigorous validation under rare but consequential events, addressing the persistent data scarcity problem in autonomous driving development and providing a scalable solution for evaluating system robustness against complex, real-world risks.

Key finding

The proposed framework substantially increases the coverage of high-risk and long-tail events and improves consistency between simulated and real-world traffic distributions compared to existing methods.

Methodology

simulation_modeling

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via author_sweep_intake on 2026-05-28.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	author_sweep	—	—	2	2026-05-28
archive	success	canonical_url	—	—	1	2026-06-04
extract	success	cached	—	—	3	2026-06-10
clean	success	clean	—	—	1	2026-06-04
chunk	success	chunk	—	—	1	2026-06-04
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-04
enrich	success	—	—	—	1	2026-05-28
promote	success	—	—	—	1	2026-06-04
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	2	2026-06-10
tag	success	vector_similarity	—	—	15	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

generative ai voice assistants

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Empirical Findings: crash risk outcomes
Theoretical Contribution: computational model