Urban Driving with Multi-Objective Deep Reinforcement Learning
DOI: 10.65109/loxq3741
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the challenge of autonomous urban driving by proposing a multi-objective deep reinforcement learning (DRL) framework. The authors argue that traditional scalar reward functions struggle to balance competing driving aspects, such as safety, traffic rule compliance, and passenger comfort, often leading to poor exploration or unintuitive behavior. To solve this, they adapt the thresholded lexicographic Q-learning framework to a deep learning setting, allowing the agent to prioritize objectives hierarchically—ensuring safety constraints are met before optimizing for other goals. Additionally, the paper introduces an extension for factored Markov Decision Processes (MDPs) to improve data efficiency by exploiting the internal structure of the driving environment. The methodology employs a multi-objective Deep Q-Network (DQN) agent that learns policies for distinct objectives separately. The authors modify the standard thresholded lexicographic approach by using an adaptive threshold based on the learned Q-values, which restricts the set of admissible actions at each state to those satisfying higher-priority constraints. This ensures the policy either satisfies all constraints or prioritizes the most critical ones, mimicking human decision-making. To handle the complexity of large state spaces, the authors decompose the factored MDP into auxiliary tasks. They train separate Q-functions for individual surrounding vehicles (factored state spaces) and use these learned values as auxiliary features for the main Q-function. The state space is designed to be geometry-independent, incorporating ego vehicle kinematics, relative positions of surrounding vehicles, and topological road information rather than raw visual inputs. The experimental results demonstrate that this approach significantly improves data efficiency compared to standard methods. By leveraging the factored Q-functions as features, the agent requires fewer training steps to achieve comparable performance. Furthermore, the study shows that the learned policy can be zero-shot transferred to a ring road scenario without retraining, maintaining performance despite the change in road geometry. This transferability is attributed to the use of topology-based state representations rather than scenario-specific visual data. The multi-objective architecture also allows for modular integration, where specific objectives can be implemented via rule-based systems if manual specification is easier, avoiding integration issues. The significance of this work lies in its ability to decompose complex autonomous driving tasks into manageable, prioritized sub-tasks, addressing the exploration and reward design challenges inherent in single-objective RL. The proposed adaptive thresholding and factored MDP extensions provide a robust mechanism for handling safety-critical constraints while improving learning efficiency. The successful zero-shot transfer highlights the potential for creating generalizable driving policies that are not bound to specific map geometries, advancing the feasibility of deploying DRL-based agents in diverse urban environments.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | OpenAlex-citations | — | — | 1 | 2026-06-18 |
| archive | success | unpaywall | — | — | 2 | 2026-06-25 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-18 |
| chunk | success | chunk | — | — | 1 | 2026-06-18 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-18 |
| promote | success | — | — | — | 1 | 2026-06-18 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-18 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Theoretical Contribution: computational model