Reinforcement Learning Agent under Partial Observability for Traffic Light Control in Presence of Gridlocks
DOI: 10.29007/bdgn
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the challenge of automating traffic light control in complex, gridlock-prone urban environments, specifically focusing on the Sathorn Road network in Bangkok. The research is motivated by the limitations of current manual traffic control, which relies heavily on human expertise to resolve gridlocks caused by high demand from nearby educational institutions and insufficient link capacity. While reinforcement learning (RL) has been applied to traffic control, existing studies often lack pragmatic considerations for developing countries, particularly regarding economic constraints that limit sensor coverage. This work specifically investigates the effectiveness of an RL agent under conditions of partial observability, where the agent cannot perceive the entire network state, a realistic constraint for physical deployment. The study utilizes the Chula-Sathorn SUMO Simulator (Chula-SSS), a calibrated microscopic traffic simulation dataset representing the Sathorn Road network during morning rush hours (6 AM to 9 AM). The problem is formulated as a Partially Observable Markov Decision Process (POMDP). The agent’s state space consists of occupancy values from 21 simulated sensor cells (18 upstream, 3 downstream) and the current traffic signal phase, reflecting limited physical sensor placement. The action space comprises nine traffic signal phases derived from expert heuristics. The reward function balances vehicle throughput against weighted observed occupancy to penalize congestion. The agent employs the Ape-X Deep Q-Network architecture, incorporating Double Q-Learning, Prioritized Experience Replay, Dueling Networks, and Multi-step learning. Experiments were conducted using the RLlib library, training the agent for 250 epochs across 1,000 simulated episodes. Results indicate that the agent successfully converges and learns effective policies despite partial observability. The agent’s behavior aligns with human expert strategies: it prioritizes keeping jam lengths low at the critical Surasak Intersection to prevent gridlock propagation, while allowing higher congestion on South Sathorn Road, treating it as a buffer zone. Performance varied based on reward function parameters; agents with lower penalties for occupancy achieved higher vehicle throughput, whereas those with higher occupancy penalties maintained lower jam lengths on specific approaches like CharoenRat Road. The agent demonstrated the ability to adjust green light durations dynamically to manage queue spillbacks and mitigate the risk of closed-loop gridlocks. The significance of this work lies in demonstrating that reinforcement learning agents can learn viable traffic control policies under realistic physical constraints, such as limited sensor data and partial observability. The findings suggest that RL can replicate or augment human expert heuristics in complex, interconnected gridlock scenarios. This supports the potential for future deployment of learning-based traffic control systems in developing countries, where infrastructure costs limit comprehensive sensor networks, provided that appropriate reward structures and observability constraints are accounted for.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | Crossref | — | — | 1 | 2026-06-20 |
| archive | success | unpaywall | — | — | 2 | 2026-06-26 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-20 |
| chunk | success | chunk | — | — | 1 | 2026-06-20 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-20 |
| promote | success | — | — | — | 1 | 2026-06-20 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-20 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.