DR2L: Surfacing Corner Cases to Robustify Autonomous Driving via Domain Randomization Reinforcement Learning

Niu, Haoyi; Hu, Jianming; Cui, Zheyu; Zhang, Yi · 2021 · OpenAlex-citations

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the "Sim2real gap" in autonomous driving, where deep reinforcement learning (DeepRL) models trained in simulators fail to generalize to real-world conditions, particularly regarding rare and risky "corner cases." The authors propose Domain Randomization Reinforcement Learning (DR2L), a framework designed to robustify autonomous vehicle policies by simultaneously surfacing harder simulated scenarios and training agents to adapt to them. The motivation stems from the inefficiency and danger of collecting real-world data for edge cases, as well as the limitations of standard Domain Randomization (DR) methods that often use uniform or static randomization distributions. The methodology employs a bidirectional optimization loop within the SUMO traffic simulator. The environment generator uses an Automatic Domain Randomization (AutoDR) approach to dynamically adjust the distribution boundaries of randomized parameters, specifically the initial velocities of neighboring vehicles. The system alternates between "episode sampling" (training on random parameters) and "boundary sampling" (testing performance at distribution limits). Based on performance metrics at the boundaries, the algorithm widens or narrows the randomization range to create a curriculum of increasingly difficult scenarios. The agent is a Deep Q-Network (DQN) that observes state variables including relative distances, velocities, and randomized parameters, and selects actions such as acceleration, deceleration, or lane changes. The reward function is carefully designed to balance safety (collision avoidance) and efficiency (speed), ensuring that high rewards are not achieved through unsafe high-speed collisions. Experimental results demonstrate that the DR2L model successfully identifies and trains on the hardest cases the agent can adapt to, as evidenced by the stabilization of distribution parameters and cumulative rewards during training. When evaluated against agents trained in static environments (Easy, Mid, and Hard fixed distributions), the DR2L-trained agent showed superior generalization. Specifically, agents trained in fixed environments failed to avoid collisions in harder test environments, whereas the DR2L agent maintained collision-free performance across all difficulty levels. Furthermore, the DR2L agent achieved higher average speeds (e.g., 21.07 m/s in the Easy environment) compared to agents trained in fixed hard environments (18.89 m/s), indicating that dynamic randomization yields policies that are both safer and more efficient than those trained on static, albeit difficult, scenarios. The significance of this work lies in providing a scalable, simulation-based method for generating and training on corner cases without requiring real-world data. By automating the creation of a difficulty curriculum, DR2L bridges the gap between simulation and reality more effectively than uniform or fixed domain randomization. The findings suggest that dynamic, performance-guided randomization is essential for developing robust autonomous driving policies that can handle the variability and unpredictability of real-world traffic.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	OpenAlex-citations	—	—	1	2026-06-17
archive	success	semantic_scholar	—	—	6	2026-06-25
extract	success	cached	—	—	2	2026-06-25
clean	success	clean	—	—	1	2026-06-18
chunk	success	chunk	—	—	1	2026-06-18
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-18
promote	success	—	—	—	1	2026-06-17
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-25
tag	success	vector_similarity	—	—	6	2026-06-18
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-25; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

simulator training transfer