Deep Reinforcement Learning for Predictive Longitudinal Control of Automated Vehicles

Buechel, Martin; Knoll, Alois · 2018 · Crossref

DOI: 10.1109/itsc.2018.8569977

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of longitudinal control for automated vehicles, specifically aiming to improve upon classical Proportional-Integral (PI) controllers and computationally expensive Nonlinear Model Predictive Control (NMPC) schemes. While PI controllers suffer from poor accuracy and comfort in the presence of disturbances like road grade changes, NMPC offers high accuracy but requires significant computational resources and precise model parameter identification. The authors propose a model-free Deep Reinforcement Learning (DRL) approach that incorporates advance knowledge of future speed references and road disturbances. A key contribution is the identification of a critical design parameter: the selection of advance knowledge signals during the training phase, which significantly impacts learning speed. The authors develop a Predictive Reinforcement Learning Controller with Incorporated Advance Knowledge (PRLC-A) using the Deep Deterministic Policy Gradient (DDPG) algorithm. To enable predictive behavior, the state vector is augmented with future speed error trajectories and road grade information over a prediction horizon. To address the challenge of designing training trajectories, the authors propose using Amplitude Modulated Pseudo Random Binary Signals (APRBS) to excite the system across the state space, rather than training on specific real-world scenarios. The reward function is designed to minimize speed tracking error and penalize high control outputs. The system was simulated using a discrete vehicle dynamics model implemented in Python with TensorFlow, incorporating engine and brake torque dynamics, rolling resistance, and aerodynamic drag. Experimental results demonstrate that training with APRBS signals yields considerably faster learning convergence compared to training on specific evaluation datasets. When evaluated on a real-world driving scenario in a parking garage, the PRLC-A controller achieved tracking performance close to the optimal solution of an NMPC controller. Crucially, the DRL approach offered substantial computational advantages; inference times for the PRLC-A were between 30 to 70 times faster than the NMPC controller and remained insensitive to increases in prediction horizon length. For instance, with a prediction horizon of 20 steps, the PRLC-A required approximately 1.1 ms per cycle, whereas the NMPC required 81.1 ms. The study concludes that DRL is a viable alternative to NMPC for predictive longitudinal control, offering near-optimal performance with significantly reduced computational costs. This makes it suitable for real-time applications in automated vehicles equipped with advance knowledge capabilities. However, the authors note challenges regarding high variance between training runs and the need for extensive training samples. Future work is directed toward investigating robustness against unmodeled disturbances, such as wind and varying vehicle mass, and exploring apprenticeship or imitation learning to stabilize training and improve performance.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	Crossref	—	—	1	2026-06-18
archive	success	unpaywall	—	—	2	2026-06-25
extract	success	cached	—	—	2	2026-06-26
clean	success	clean	—	—	1	2026-06-20
chunk	success	chunk	—	—	1	2026-06-20
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-20
enrich	success	openalex	—	—	1	2026-06-20
promote	success	—	—	—	1	2026-06-18
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-26
tag	success	vector_similarity	—	—	6	2026-06-20
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

anticipation

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Theoretical Contribution: computational model