Hierarchical Reinforcement Learning for Dynamic Autonomous Vehicle Navigation at Intelligent Intersections

Qian, Sun; Zhang, Le; Yu, Huan; Zhang, Weijia; Mei, Yu; Xiong, Hui · 2023 · Unknown

DOI: 10.1145/3580305.3599839

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of jointly optimizing traffic signal control and autonomous vehicle (AV) navigation in mixed traffic environments, where human-driven and autonomous vehicles coexist. Existing approaches often rely on domain-specific heuristics or treat these tasks independently, failing to account for their interdependence. The authors propose NavTL, a learning-based framework designed to improve travel efficiency and reduce congestion by coordinating traffic light phases with dynamic AV rerouting. The motivation stems from the limitations of traditional optimization methods, which struggle with dynamic real-world scenarios, and the lack of prior machine learning studies that simultaneously control both signals and AV trajectories in heterogeneous traffic. The methodology employs a graph-enhanced, multi-agent, decentralized bi-directional hierarchical reinforcement learning (HRL) framework. Traffic lights are modeled as "manager" agents operating at lower temporal resolutions, while AVs are "worker" agents operating at higher resolutions. Unlike traditional feudal RL, which assumes homogeneous tasks, NavTL handles heterogeneous agents with distinct state and action spaces. The framework utilizes a dynamic signal-vehicle graph to extract regional navigation intentions from AVs, which are propagated bottom-up to enhance the managers' state representations. Conversely, managers provide top-down guidance via goal vectors to steer AV actions. To facilitate cooperation among traffic signals, a static signal-signal graph with graph convolutional networks is used. Deep Q-Networks (DQN) are employed to learn state-action values for both agents, with rewards defined by negative queue length for signals and accumulated travel time for vehicles. The study evaluates NavTL through extensive experiments on one synthetic dataset and two real-world network-level datasets. The results demonstrate that the proposed framework effectively improves travel efficiency and minimizes congestion compared to existing methods. By integrating bottom-up intention information from vehicles and top-down guidance from signals, the system successfully balances the inconsistent objectives of congestion reduction and route optimization. The experiments validate the superiority of NavTL in handling the coupled nature of traffic signal control and AV navigation, showing significant improvements in overall system performance. The significance of this work lies in being the first to apply graph-enhanced hierarchical reinforcement learning to the coordinated control of traffic signals and autonomous vehicles in mixed traffic environments using real-world data. It introduces a novel bi-directional message-passing mechanism that enables effective collaboration between heterogeneous agents. This approach overcomes the limitations of static pathfinding and heuristic signal control, offering a scalable solution for intelligent transportation systems. The findings suggest that joint optimization through hierarchical RL can substantially enhance urban traffic efficiency, providing a robust foundation for future Cooperative Vehicle Infrastructure System (CVIS) developments.

Key finding

The proposed NavTL framework significantly improves travel efficiency and reduces congestion at intelligent intersections by jointly controlling traffic signals and autonomous vehicle rerouting through a hierarchical reinforcement learning approach.

Methodology

simulation_modeling

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via author_sweep_intake on 2026-05-28.

StageOutcomeToolModelPromptAttemptsCompleted
discover success author_sweep 2 2026-05-28
archive success canonical_url 1 2026-06-06
extract success cached 3 2026-06-10
clean success clean 1 2026-06-04
chunk success chunk 1 2026-06-04
embed success embed Qwen/Qwen3-Embedding-8B 1 2026-06-04
enrich success 1 2026-05-28
promote success 1 2026-06-04
summarize success llm qwen3.6-27b-prismaquant summ-v5 2 2026-06-10
tag success vector_similarity 15 2026-06-11
verify success 2 2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.