Decentralized graph attention multi-agent reinforcement learning for adaptive urban traffic routing.

Mahmoud M; Meshoul S; Batouche M; Hammad M · 2026 · PubMed Central

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the limitations of current genetic algorithm (GA)-based traffic routing systems, which fail to adapt quickly to real-time disruptions, lack coordination leading to route oscillation, and cannot transfer across different city topologies. To solve these issues, the authors propose MA-GRL, a decentralized multi-agent reinforcement learning framework that combines Graph Attention Networks (GAT) with Multi-Agent Proximal Policy Optimization (MAPPO). The system is formulated as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP), enabling vehicles to execute decentralized policies based on local observations while utilizing a Centralized Training with Decentralized Execution (CTDE) paradigm. The MA-GRL architecture employs a 3-layer GAT encoder with 8 attention heads to process local traffic observations into 128-dimensional embeddings, capturing the structural context of a 3-hop neighborhood. This encoder handles variable action spaces as vehicles traverse intersections. A novel coordination reward is introduced to implicitly penalize simultaneous route switches by nearby agents, fostering stable cooperation without explicit communication. The model comprises approximately 497,000 shared parameters and was trained using MAPPO with Generalized Advantage Estimation. Experiments were conducted using the SUMO simulator on scenarios representing Monaco, Luxembourg, and Bologna, involving 100 vehicles over 3,600-second episodes. The results demonstrate that MA-GRL reduces average travel time by 11.1% compared to GA-based routing (p < 0.001, Cohen’s d = 0.80). The framework exhibits robust adaptability, recovering from 10% road closures within 45 steps, significantly faster than GA methods that require re-optimization. Furthermore, the model achieves 87% zero-shot transfer retention when applied to unseen city topologies, validating the effectiveness of the structural graph representations. Ablation studies confirm that the coordination reward is critical, as its removal increases route oscillation by 202% and degrades travel time performance. The study concludes that integrating graph neural networks with multi-agent reinforcement learning offers a scalable, adaptive, and transferable solution for urban traffic management. By addressing the specific failure modes of metaheuristic approaches—slow adaptation, coordination failure, and transfer brittleness—MA-GRL provides a viable path toward intelligent transportation systems that can alleviate congestion and its associated economic and environmental costs. The authors note limitations regarding the need for integration with vehicle navigation systems and hardware constraints on scalability beyond approximately 200 agents during training.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	PubMed Central	—	—	1	2026-06-18
archive	success	unpaywall	—	—	2	2026-06-25
extract	success	cached	—	—	2	2026-06-26
clean	success	clean	—	—	1	2026-06-20
chunk	success	chunk	—	—	1	2026-06-20
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-20
enrich	success	openalex	—	—	1	2026-06-20
promote	success	—	—	—	1	2026-06-18
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-26
tag	success	vector_similarity	—	—	6	2026-06-20
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

mental model of traffic