Urban Driving with Multi-Objective Deep Reinforcement Learning

Li, Changjian; Czarnecki, Krzysztof · 2019 · OpenAlex-citations

DOI: 10.65109/loxq3741

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of autonomous urban driving by proposing a multi-objective deep reinforcement learning (DRL) framework. The authors argue that traditional scalar reward functions struggle to balance competing driving aspects, such as safety, traffic rule compliance, and passenger comfort, often leading to poor exploration or unintuitive behavior. To solve this, they adapt the thresholded lexicographic Q-learning framework to a deep learning setting, allowing the agent to prioritize objectives hierarchically—ensuring safety constraints are met before optimizing for other goals. Additionally, the paper introduces an extension for factored Markov Decision Processes (MDPs) to improve data efficiency by exploiting the internal structure of the driving environment. The methodology employs a multi-objective Deep Q-Network (DQN) agent that learns policies for distinct objectives separately. The authors modify the standard thresholded lexicographic approach by using an adaptive threshold based on the learned Q-values, which restricts the set of admissible actions at each state to those satisfying higher-priority constraints. This ensures the policy either satisfies all constraints or prioritizes the most critical ones, mimicking human decision-making. To handle the complexity of large state spaces, the authors decompose the factored MDP into auxiliary tasks. They train separate Q-functions for individual surrounding vehicles (factored state spaces) and use these learned values as auxiliary features for the main Q-function. The state space is designed to be geometry-independent, incorporating ego vehicle kinematics, relative positions of surrounding vehicles, and topological road information rather than raw visual inputs. The experimental results demonstrate that this approach significantly improves data efficiency compared to standard methods. By leveraging the factored Q-functions as features, the agent requires fewer training steps to achieve comparable performance. Furthermore, the study shows that the learned policy can be zero-shot transferred to a ring road scenario without retraining, maintaining performance despite the change in road geometry. This transferability is attributed to the use of topology-based state representations rather than scenario-specific visual data. The multi-objective architecture also allows for modular integration, where specific objectives can be implemented via rule-based systems if manual specification is easier, avoiding integration issues. The significance of this work lies in its ability to decompose complex autonomous driving tasks into manageable, prioritized sub-tasks, addressing the exploration and reward design challenges inherent in single-objective RL. The proposed adaptive thresholding and factored MDP extensions provide a robust mechanism for handling safety-critical constraints while improving learning efficiency. The successful zero-shot transfer highlights the potential for creating generalizable driving policies that are not bound to specific map geometries, advancing the feasibility of deploying DRL-based agents in diverse urban environments.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

StageOutcomeToolModelPromptAttemptsCompleted
discover success OpenAlex-citations 1 2026-06-18
archive success unpaywall 2 2026-06-25
extract success cached 2 2026-06-26
clean success clean 1 2026-06-18
chunk success chunk 1 2026-06-18
embed success embed Qwen/Qwen3-Embedding-8B 1 2026-06-18
promote success 1 2026-06-18
summarize success llm qwen3.6-27b-prismaquant summ-v5 1 2026-06-26
tag success vector_similarity 6 2026-06-18
verify success 1 2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).