A junction-tree based learning algorithm to optimize network wide traffic control: A coordinated multi-agent framework

Zhu, Feng; Aziz, H. M. Abdul; Qian, Xinwu; Ukkusuri, Satish V. · 2015 · OpenAlex-citations

DOI: 10.1016/j.trc.2014.12.009

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of optimizing network-wide traffic signal control by proposing a novel coordinated multi-agent reinforcement learning framework. The authors identify that existing adaptive control systems are often computationally expensive or fail to account for dynamic feedback, while previous reinforcement learning approaches typically treat intersections as independent agents. To overcome these limitations, the study introduces a Junction-Tree Algorithm (JTA) based reinforcement learning method. This approach models traffic signals as intelligent agents that coordinate their decisions to maximize system-wide performance, offering an exact inference procedure capable of handling general cyclic road networks, unlike the max-plus algorithm which struggles with convergence in such structures. The methodology integrates the JTA with a reinforcement learning framework where traffic controllers act as agents interacting with a stochastic environment simulated in VISSIM. The state of each signal phase is discretized into three congestion levels based on residual queue lengths, and actions involve selecting signal phases within minimum and maximum green time constraints. The JTA is employed to compute the best joint actions for all coordinated intersections by decomposing the network into local sub-problems and propagating messages through a constructed junction tree. The algorithm utilizes an average reward technique (R-MART) to balance exploration and exploitation during the learning phase. The experimental design tests the algorithm on a network of 18 signalized intersections under low, medium, and high traffic demand scenarios. Performance is evaluated against independent Q-learning, real-time adaptive Longest-Queue-First (LQF) controllers, and fixed timing plans, using metrics such as average delay, number of stops, and vehicular emissions estimated via the MOVES2010 simulator. The results demonstrate that the JTA-based algorithm significantly outperforms independent Q-learning, real-time adaptive LQF, and fixed timing plans in terms of average delay and stopped delay across all congestion levels. Specifically, at high congestion, JTA reduced average delay to 11.22 seconds per vehicle compared to 14.98 seconds for Q-learning and 14.93 seconds for LQF. While JTA generally reduced the number of stops, it performed slightly worse than LQF in this specific metric at low and high congestion levels, likely due to its focus on minimizing queue lengths to prevent spill-back rather than explicitly minimizing stops. The study also confirms that JTA provides better convergence and performance than the max-plus algorithm in cyclic networks. Furthermore, the coordinated control yielded reductions in vehicular emissions, including CO, CO2, NOx, and fuel consumption. The significance of this work lies in providing a computationally efficient, exact inference method for coordinated traffic control that scales to complex, cyclic road networks. By leveraging the coordination of agents, the proposed framework achieves superior system-level performance compared to decentralized or non-learning approaches. The findings suggest that coordinated reinforcement learning is a viable strategy for next-generation intelligent transportation systems, particularly in connected vehicle environments where infrastructure-to-infrastructure communication enables real-time coordination. The integration of environmental impact assessment further highlights the potential of such algorithms to support sustainable mobility goals by reducing both travel delays and vehicular emissions.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	OpenAlex-citations	—	—	1	2026-06-20
archive	success	unpaywall	—	—	2	2026-06-26
extract	success	pdftotext	—	—	2	2026-06-26
clean	success	clean	—	—	1	2026-06-26
chunk	success	chunk	—	—	1	2026-06-26
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-26
enrich	failed	—	—	—	4	2026-06-26
promote	success	—	—	—	1	2026-06-20
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-26
tag	success	vector_similarity	—	—	6	2026-06-26
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

traffic density