Safe Reinforcement Learning on Autonomous Vehicles
DOI: 10.1109/iros.2018.8593420
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the challenge of applying reinforcement learning (RL) to safety-critical autonomous driving systems, specifically focusing on navigating unsigned intersections. Standard RL methods rely on unconstrained exploration, which poses significant risks in physical environments where failure can lead to collisions. While existing safe RL approaches often provide formal guarantees, they typically rely on idealized models that struggle with the stochasticity and high dimensionality of real-world traffic. The authors propose a framework that uses prediction models to constrain the exploration space, allowing an RL agent to learn efficient policies while ensuring safety by masking actions predicted to cause collisions. The methodology formulates intersection handling as a stochastic game involving multiple agents. To manage computational complexity, the authors assume agents execute high-level actions (intentions) rather than continuous low-level controls, allowing for bounded closed-loop corrections. Safety is enforced by predicting the trajectories of the ego vehicle and surrounding traffic over a fixed horizon. Using probabilistic models based on Kalman filters and constant velocity assumptions, the system calculates safety margins derived from the variance of predicted trajectories. Actions that result in predicted overlaps between vehicle regions are masked, preventing the RL agent from selecting them. The authors provide probabilistic safety guarantees using Chebyshev’s inequality, acknowledging that these guarantees are weaker than strict formal methods but more applicable to noisy, high-dimensional systems. Experiments were conducted using the SUMO traffic simulator and Deep Q-Networks (DQN). The authors trained two distinct policies: one to minimize disruption to other vehicles (measured by traffic braking) and another to maximize the minimum safety distance to other vehicles. The simulation involved traffic density controlled by emission probabilities and vehicles following the Intelligent Driver Model. The results demonstrated that both policies achieved zero collisions during training. The policy maximizing safety margins learned to take larger gaps in sparse traffic but seize opportunities in dense traffic, while the policy minimizing braking optimized for smooth traffic flow. The authors also validated the approach on a real autonomous vehicle in Mountain View, California, using Lidar and radar data for perception, confirming that the simulation-trained policy could be deployed effectively. The significance of this work lies in providing a general, intuitive framework for safe RL that scales to complex, multi-agent environments. By using prediction to mask unsafe actions, the method allows the agent to explore the safe subspace freely, balancing safety with efficiency. The authors conclude that while the safety guarantees are not as strong as those from reachability-based methods, the prediction-based approach is more robust to noise and better suited for real-world applications where adversarial behavior is not assumed. This enables autonomous vehicles to learn nuanced driving behaviors, such as adaptive standoff distances, without compromising safety during the learning process.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | Crossref | — | — | 1 | 2026-06-18 |
| archive | success | semantic_scholar | — | — | 6 | 2026-06-25 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-20 |
| chunk | success | chunk | — | — | 1 | 2026-06-20 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-20 |
| enrich | success | openalex | — | — | 1 | 2026-06-20 |
| promote | success | — | — | — | 1 | 2026-06-18 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-20 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Theoretical Contribution: computational model