Learning When to Drive in Intersections by Combining Reinforcement Learning and Model Predictive Control
DOI: 10.1109/itsc.2019.8916922
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the challenge of enabling automated vehicles to safely and efficiently negotiate intersections with other road users, particularly when the intentions of those users are unknown. The authors propose a hierarchical decision-making algorithm that combines Reinforcement Learning (RL) for high-level decision making with Model Predictive Control (MPC) for low-level trajectory planning. This approach aims to overcome the limitations of conventional rule-based systems and previous methods that struggled with complex intersection geometries or required vehicle-to-vehicle communication. The methodology decomposes the problem into two layers. The high-level module uses Q-learning, a model-free RL approach, to determine optimal actions such as "take way," "give way," or "follow" specific vehicles. This policy is implemented using a Deep Q-Network with shared weights and an LSTM layer to handle partial observability and sequential dependencies. The low-level module employs an MPC controller to generate smooth, safe acceleration profiles that satisfy physical constraints and avoid collisions. The MPC provides immediate feedback to the RL agent via a reward function that penalizes potential collisions and discomfort (jerk/acceleration), allowing the policy to learn which actions are feasible and comfortable. The system was evaluated in a simulation environment featuring one or two crossing points, with surrounding traffic modeled using agents with varying intentions (take way, give way, cautious). The results demonstrate that the proposed RL-MPC architecture outperforms a benchmark Sliding Mode controller. Specifically, the MPC-based agent achieved a higher success rate in crossing intersections without collisions or timeouts. Additionally, the RL-MPC agent required significantly fewer training episodes to converge compared to the Sliding Mode agent. The study highlights that the MPC controller’s ability to predict collisions and assess trajectory feasibility provides crucial feedback that accelerates the learning process and improves performance. The separation of decision-making and control allows the system to handle complex scenarios, such as multiple crossing points, more effectively than previous approaches. The significance of this work lies in its demonstration that combining model-free RL with model-based MPC can create a robust decision-making framework for autonomous driving in unstructured environments. By leveraging the predictive capabilities of MPC for immediate safety feedback and the adaptive learning of RL for long-term strategy, the system achieves a balance between safety, comfort, and efficiency. This approach offers a viable solution for automated vehicles to interact with non-automated traffic without requiring cooperative communication, thereby advancing the field of autonomous intersection negotiation.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | Crossref | — | — | 1 | 2026-06-25 |
| archive | success | unpaywall | — | — | 2 | 2026-06-26 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-26 |
| chunk | success | chunk | — | — | 1 | 2026-06-26 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-26 |
| enrich | success | openalex | — | — | 1 | 2026-06-26 |
| promote | success | — | — | — | 1 | 2026-06-25 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-26 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Theoretical Contribution: computational model