Hierarchical Reinforcement Learning for Dynamic Autonomous Vehicle Navigation at Intelligent Intersections

Qian, Sun; Zhang, Le; Yu, Huan; Zhang, Weijia; Mei, Yu; Xiong, Hui · 2023 · Unknown

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of jointly optimizing traffic signal control and autonomous vehicle (AV) navigation in mixed traffic environments, where human-driven and autonomous vehicles coexist. Existing approaches often rely on domain-specific heuristics or treat these tasks independently, failing to account for their interdependence. The authors propose NavTL, a learning-based framework designed to improve travel efficiency and reduce congestion by coordinating traffic light phases with dynamic AV rerouting. The motivation stems from the limitations of traditional optimization methods, which struggle with dynamic real-world scenarios, and the lack of prior machine learning studies that simultaneously control both signals and AV trajectories in heterogeneous traffic. The methodology employs a graph-enhanced, multi-agent, decentralized bi-directional hierarchical reinforcement learning (HRL) framework. Traffic lights are modeled as "manager" agents operating at lower temporal resolutions, while AVs are "worker" agents operating at higher resolutions. Unlike traditional feudal RL, which assumes homogeneous tasks, NavTL handles heterogeneous agents with distinct state and action spaces. The framework utilizes a dynamic signal-vehicle graph to extract regional navigation intentions from AVs, which are propagated bottom-up to enhance the managers' state representations. Conversely, managers provide top-down guidance via goal vectors to steer AV actions. To facilitate cooperation among traffic signals, a static signal-signal graph with graph convolutional networks is used. Deep Q-Networks (DQN) are employed to learn state-action values for both agents, with rewards defined by negative queue length for signals and accumulated travel time for vehicles. The study evaluates NavTL through extensive experiments on one synthetic dataset and two real-world network-level datasets. The results demonstrate that the proposed framework effectively improves travel efficiency and minimizes congestion compared to existing methods. By integrating bottom-up intention information from vehicles and top-down guidance from signals, the system successfully balances the inconsistent objectives of congestion reduction and route optimization. The experiments validate the superiority of NavTL in handling the coupled nature of traffic signal control and AV navigation, showing significant improvements in overall system performance. The significance of this work lies in being the first to apply graph-enhanced hierarchical reinforcement learning to the coordinated control of traffic signals and autonomous vehicles in mixed traffic environments using real-world data. It introduces a novel bi-directional message-passing mechanism that enables effective collaboration between heterogeneous agents. This approach overcomes the limitations of static pathfinding and heuristic signal control, offering a scalable solution for intelligent transportation systems. The findings suggest that joint optimization through hierarchical RL can substantially enhance urban traffic efficiency, providing a robust foundation for future Cooperative Vehicle Infrastructure System (CVIS) developments.

Key finding

The proposed NavTL framework significantly improves travel efficiency and reduces congestion at intelligent intersections by jointly controlling traffic signals and autonomous vehicle rerouting through a hierarchical reinforcement learning approach.

Methodology

simulation_modeling

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via author_sweep_intake on 2026-05-28.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	author_sweep	—	—	2	2026-05-28
archive	success	canonical_url	—	—	1	2026-06-06
extract	success	cached	—	—	3	2026-06-10
clean	success	clean	—	—	1	2026-06-04
chunk	success	chunk	—	—	1	2026-06-04
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-04
enrich	success	—	—	—	1	2026-05-28
promote	success	—	—	—	1	2026-06-04
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	2	2026-06-10
tag	success	vector_similarity	—	—	15	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

situational awareness