Decentralized graph attention multi-agent reinforcement learning for adaptive urban traffic routing.

Mahmoud M; Meshoul S; Batouche M; Hammad M · 2026 · PubMed Central

DOI: 10.1038/s41598-026-56204-2

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the limitations of current genetic algorithm (GA)-based traffic routing systems, which fail to adapt quickly to real-time disruptions, lack coordination leading to route oscillation, and cannot transfer across different city topologies. To solve these issues, the authors propose MA-GRL, a decentralized multi-agent reinforcement learning framework that combines Graph Attention Networks (GAT) with Multi-Agent Proximal Policy Optimization (MAPPO). The system is formulated as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP), enabling vehicles to execute decentralized policies based on local observations while utilizing a Centralized Training with Decentralized Execution (CTDE) paradigm. The MA-GRL architecture employs a 3-layer GAT encoder with 8 attention heads to process local traffic observations into 128-dimensional embeddings, capturing the structural context of a 3-hop neighborhood. This encoder handles variable action spaces as vehicles traverse intersections. A novel coordination reward is introduced to implicitly penalize simultaneous route switches by nearby agents, fostering stable cooperation without explicit communication. The model comprises approximately 497,000 shared parameters and was trained using MAPPO with Generalized Advantage Estimation. Experiments were conducted using the SUMO simulator on scenarios representing Monaco, Luxembourg, and Bologna, involving 100 vehicles over 3,600-second episodes. The results demonstrate that MA-GRL reduces average travel time by 11.1% compared to GA-based routing (p < 0.001, Cohen’s d = 0.80). The framework exhibits robust adaptability, recovering from 10% road closures within 45 steps, significantly faster than GA methods that require re-optimization. Furthermore, the model achieves 87% zero-shot transfer retention when applied to unseen city topologies, validating the effectiveness of the structural graph representations. Ablation studies confirm that the coordination reward is critical, as its removal increases route oscillation by 202% and degrades travel time performance. The study concludes that integrating graph neural networks with multi-agent reinforcement learning offers a scalable, adaptive, and transferable solution for urban traffic management. By addressing the specific failure modes of metaheuristic approaches—slow adaptation, coordination failure, and transfer brittleness—MA-GRL provides a viable path toward intelligent transportation systems that can alleviate congestion and its associated economic and environmental costs. The authors note limitations regarding the need for integration with vehicle navigation systems and hardware constraints on scalability beyond approximately 200 agents during training.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

StageOutcomeToolModelPromptAttemptsCompleted
discover success PubMed Central 1 2026-06-18
archive success unpaywall 2 2026-06-25
extract success cached 2 2026-06-26
clean success clean 1 2026-06-20
chunk success chunk 1 2026-06-20
embed success embed Qwen/Qwen3-Embedding-8B 1 2026-06-20
enrich success openalex 1 2026-06-20
promote success 1 2026-06-18
summarize success llm qwen3.6-27b-prismaquant summ-v5 1 2026-06-26
tag success vector_similarity 6 2026-06-20
verify success 1 2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.