Reinforcement Learning Versus PDE Backstepping and PI Control for Congested Freeway Traffic
DOI: 10.1109/tcst.2021.3116796
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the challenge of mitigating stop-and-go traffic congestion on freeway segments by comparing reinforcement learning (RL) controllers against established model-based control strategies. The research is motivated by the limitations of existing Lyapunov-based controllers, such as PDE backstepping, proportional (P), and proportional-integral (PI) control, which are designed for linearized traffic models and thus offer only local stability guarantees. These model-based approaches also require precise knowledge of system parameters and struggle to adapt to nonlinear dynamics or uncertain conditions. The authors propose formulating the boundary control problem as an RL task to develop controllers that stabilize traffic without requiring explicit knowledge of the system dynamics. The study utilizes the macroscopic Aw-Rascle-Zhang (ARZ) model, a second-order system of quasi-linear partial differential equations (PDEs) describing traffic density and velocity, to capture oscillatory behaviors inherent in congested traffic. The control objective is the spatial L2 norm regulation of traffic states to a uniform equilibrium. The authors compare four control strategies: setpoint control, PDE backstepping, P control, and PI control, all derived from linearized models. For the RL approach, the authors employ the Proximal Policy Optimization (PPO) algorithm, a neural network-based policy gradient method. The RL controllers are trained through iterative interactions with a numerical simulator of the nonlinear ARZ PDE, maximizing a reward function defined by the stabilization of the spatial L2 norm of the states. The results demonstrate that RL controllers, trained over approximately one thousand episodes, nearly recover the stabilization performance of the rigorous backstepping, P, and PI controllers when the system dynamics are perfectly known. More significantly, in scenarios with partial knowledge—where the actual steady-state traffic conditions differ from those assumed in the design of Lyapunov-based controllers—the RL controllers outperform the model-based approaches. This highlights the RL method's ability to adapt to uncertain and changing conditions where fixed-gain controllers fail. The significance of this work lies in providing the first native controller for the nonlinear ARZ PDE model, addressing a gap in previous literature that focused on linearized approximations. The findings suggest that RL has strong potential for adaptive traffic management under uncertainty. However, the authors conclude that RL is not yet a fully safe or simple substitute for model-based control in real-world applications. This is due to the extensive training required on simulation models, the lack of guaranteed convergence, and the impossibility of performing collision-free iterative training on live traffic systems. Thus, while RL offers superior adaptability, model-based methods remain critical for ensuring safety and stability in practical deployments.
Key finding
Reinforcement learning controllers outperform model-based PDE backstepping and PI controllers in stabilizing congested freeway traffic when there is partial knowledge of the system dynamics, while nearly matching their performance under perfect knowledge conditions.
Methodology
simulation_modeling
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via author_sweep_intake on 2026-05-28.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | author_sweep | — | — | 2 | 2026-05-28 |
| archive | success | unpaywall | — | — | 2 | 2026-06-04 |
| extract | success | cached | — | — | 3 | 2026-06-10 |
| clean | success | clean | — | — | 1 | 2026-06-04 |
| chunk | success | chunk | — | — | 1 | 2026-06-04 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-04 |
| enrich | success | — | — | — | 1 | 2026-05-28 |
| promote | success | — | — | — | 1 | 2026-06-04 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 2 | 2026-06-10 |
| tag | success | vector_similarity | — | — | 15 | 2026-06-11 |
| verify | success | — | — | — | 2 | 2026-06-10 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Theoretical Contribution: computational model