Learning from Risk: LLM-Guided Generation of Safety-Critical Scenarios with Prior Knowledge
DOI: 10.48550/arxiv.2511.20726
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the critical challenge of validating autonomous driving systems against rare, long-tail events and complex multi-agent interactions, which are scarce in real-world datasets. Existing scenario generation methods face significant limitations: rule-based approaches lack behavioral diversity, data-driven methods cannot synthesize unseen risks, and recent generative models struggle to balance realism, controllability, and scalability. To bridge this gap, the authors propose a high-fidelity framework that integrates a Conditional Variational Autoencoder (CVAE) with a Large Language Model (LLM) to generate safety-critical scenarios that are both physically consistent and risk-sensitive. The methodology employs a dual-module architecture. First, a CVAE-GNN module learns latent spatiotemporal representations from large-scale naturalistic datasets (nuScenes and highD). The encoder uses graph neural networks to capture multi-agent dependencies and lane topology, producing a latent variable that represents stochastic traffic behavior. The decoder reconstructs physically coherent base scenarios from this latent space. Second, an LLM-guided optimization module acts as an adversarial reasoning engine. The LLM parses structured scene descriptions—including trajectories, map context, and relational cues—using chain-of-thought reasoning to identify event types and potential risks. It then dynamically generates adaptive loss weights for specific metrics, such as Time-to-Collision (TTC), minimum lateral distance, and yaw rate. These weights guide the optimization of the CVAE’s latent space, ensuring the generated scenarios align with desired risk levels while maintaining physical plausibility. Experiments conducted in CARLA and SMARTS simulators demonstrate that the framework substantially increases the coverage of high-risk and long-tail events compared to existing rule- or data-driven methods. The approach improves the consistency between simulated and real-world traffic distributions and exposes autonomous driving systems to interactions that are significantly more challenging. By modulating the learned latent distribution, the framework systematically covers low-, medium-, and high-risk regimes, effectively bridging the sim-to-real gap. The significance of this work lies in establishing a new pathway for principled stress-testing of autonomous systems. By combining data-driven motion priors with knowledge-driven semantic reasoning, the framework enables interpretable and controllable generation of diverse safety-critical scenarios. This allows for rigorous validation under rare but consequential events, addressing the persistent data scarcity problem in autonomous driving development and providing a scalable solution for evaluating system robustness against complex, real-world risks.
Key finding
The proposed framework substantially increases the coverage of high-risk and long-tail events and improves consistency between simulated and real-world traffic distributions compared to existing methods.
Methodology
simulation_modeling
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via author_sweep_intake on 2026-05-28.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | author_sweep | — | — | 2 | 2026-05-28 |
| archive | success | canonical_url | — | — | 1 | 2026-06-04 |
| extract | success | cached | — | — | 3 | 2026-06-10 |
| clean | success | clean | — | — | 1 | 2026-06-04 |
| chunk | success | chunk | — | — | 1 | 2026-06-04 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-04 |
| enrich | success | — | — | — | 1 | 2026-05-28 |
| promote | success | — | — | — | 1 | 2026-06-04 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 2 | 2026-06-10 |
| tag | success | vector_similarity | — | — | 15 | 2026-06-11 |
| verify | success | — | — | — | 2 | 2026-06-10 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Empirical Findings: crash risk outcomes
- Theoretical Contribution: computational model