Learning scalable multi-agent coordination by spatial differentiation for traffic signal control
DOI: 10.1016/j.engappai.2021.104165
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the challenge of achieving scalable, global-optimal traffic signal control (TSC) in large-scale road networks using Multi-Agent Reinforcement Learning (MARL). While existing Deep Reinforcement Learning (DRL) methods perform well for single intersections, coordinating multiple intersections remains difficult. Current MARL approaches often rely on centralized settings or Graph Attention Networks (GAT) that require gathering global information, which hinders scalability and practical deployment. The authors identify a gap in existing research: most methods focus on sharing observations but neglect the consequences of decisions on neighboring intersections. To solve this, the paper proposes a decentralized coordination framework called $\gamma$-Reward, which utilizes spatial differentiation to enable agents to coordinate without a central controller. The proposed method treats each intersection as an independent DRL agent using Dueling-Double-Deep Q-Networks (D3QN). The core innovation is the "Spatial Differentiation" mechanism, which amends the local reward of an agent based on the temporal-spatial impact of its actions on neighboring intersections. Specifically, the reward function incorporates a penalty term derived from the change in traffic capacity (waiting vehicles) at adjacent intersections after a delay span $n$. This allows agents to account for the downstream effects of their signal phases. To handle varying intersection capacities and road lengths, the authors introduce an attention mechanism ($\gamma$-Attention-Reward) that dynamically weights the influence of neighbors. The framework operates in a decentralized manner, using a replay buffer to store trajectories and correct rewards retrospectively, thereby decoupling the road network while maintaining coordination. Theoretical analysis proves that the model converges to a Nash equilibrium. Simulation results demonstrate that the proposed $\gamma$-Reward and $\gamma$-Attention-Reward models maintain state-of-the-art performance across various road network configurations. Crucially, the method achieves these results in a fully decentralized setting, unlike many competing approaches that require centralized computation. The spatial differentiation mechanism effectively allows agents to collaborate by considering the future states of surrounding intersections, leading to improved global traffic efficiency. The study confirms that this approach offers superior scalability compared to centralized MARL algorithms like QMIX or VDN, as it does not rely on a global perspective or fixed neighborhood scopes. The significance of this work lies in providing a practical, scalable solution for urban traffic management. By replacing centralized coordination with a decentralized spatial differentiation mechanism, the method aligns better with real-world infrastructure where a central controller is often impractical. The ability to achieve global optimization through local, networked agents addresses the non-stationarity issues common in independent reinforcement learning. This contribution advances the field of MARL by demonstrating that effective coordination can be achieved through reward shaping based on spatial-temporal dependencies, offering a robust alternative to graph-based observation sharing methods.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | OpenAlex-citations | — | — | 1 | 2026-06-20 |
| archive | success | semantic_scholar | — | — | 6 | 2026-06-26 |
| extract | success | pdftotext | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-26 |
| chunk | success | chunk | — | — | 1 | 2026-06-26 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-26 |
| enrich | failed | — | — | — | 4 | 2026-06-26 |
| promote | success | — | — | — | 1 | 2026-06-20 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-26 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.