Reinforcement Learning vs. Backstepping Control of Stop-and-Go Traffic

Yu, Huan; Park, Saehong; Moura, Scott; Bayen, Alexandre M.; Krstić, Miroslav · 2019 · arXiv (Cornell University)

archive: archived pipeline: cataloged verified

Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of mitigating stop-and-go traffic congestion on freeway segments by comparing Reinforcement Learning (RL) controllers against established model-based control strategies. The authors focus on the Aw-Rascle-Zhang (ARZ) macroscopic traffic model, a second-order system of partial differential equations (PDEs) that captures traffic density and velocity dynamics more accurately than first-order models. While Lyapunov-based methods like PDE backstepping, Proportional (P), and Proportional-Integral (PI) control offer rigorous stability guarantees for linearized versions of the ARZ model, they suffer from limitations including local stability only, sensitivity to model parameter changes, and the need for precise system knowledge. The study investigates whether RL, a model-free approach, can effectively stabilize the nonlinear ARZ system and adapt to uncertainties where traditional methods may fail. The methodology involves formulating the PDE boundary control problem as a Markov Decision Process (MDP) and employing the Proximal Policy Optimization (PPO) algorithm, a neural network-based policy gradient method. The RL agents are trained through iterative interactions with a numerical simulator of the ARZ PDE, aiming to maximize a reward function defined by the spatial L2 norm of the traffic states to achieve stabilization. The performance of these learned RL controllers is evaluated against four baseline strategies: setpoint control, PDE backstepping, P control, and PI control. The evaluation covers two distinct scenarios: one with perfect knowledge of the traffic flow dynamics and model parameters, and another with partial knowledge where the actual steady-state traffic conditions differ from those assumed in the design of the Lyapunov-based controllers. The results demonstrate that RL controllers nearly recover the stabilization performance of the rigorous backstepping, P, and PI controllers when the system dynamics are perfectly known. More significantly, in scenarios with partial knowledge—where the steady-state traffic is lighter or denser than assumed—the RL controllers outperform the model-based alternatives. This superior performance in uncertain conditions highlights the adaptive potential of RL. However, the authors note that achieving these results required approximately one thousand training episodes in simulation. They emphasize that such iterative training cannot be performed safely in real-world traffic due to collision risks and the lack of guaranteed convergence, meaning RL is not yet a fully safe substitute for model-based control in live systems. The significance of this work lies in providing the first native controller for the nonlinear ARZ PDE model, filling a gap in existing literature that primarily addresses linearized versions. The study concludes that while RL offers promising learning and adaptation capabilities for traffic management under uncertain and changing conditions, it remains a complex approach that requires extensive simulation-based training. Consequently, RL should be viewed as a complementary tool with specific advantages in adaptability rather than a simple replacement for mathematically rigorous, model-based control designs in current traffic infrastructure.

Key finding

Reinforcement learning controllers trained on a simulation of the ARZ traffic model outperform traditional Lyapunov-based controllers in scenarios with partial knowledge of traffic dynamics, though they require extensive offline training.

Methodology

simulation_modeling

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via author_sweep_intake on 2026-05-28.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	author_sweep	—	—	2	2026-05-28
archive	success	canonical_url	—	—	5	2026-06-06
extract	success	cached	—	—	3	2026-06-10
clean	success	clean	—	—	1	2026-06-04
chunk	success	chunk	—	—	1	2026-06-04
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-04
enrich	success	—	—	—	1	2026-05-28
promote	success	—	—	—	1	2026-06-04
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	2	2026-06-10
tag	success	vector_similarity	—	—	15	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

traffic density

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Theoretical Contribution: computational model