Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving

Zhu, Meixin; Wang, Yinhai; Pu, Ziyuan; Hu, Jingyun; Wang, Xuesong; Ke, Ruimin · 2020 · OpenAlex-citations

DOI: 10.1016/j.trc.2020.102662

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of developing autonomous velocity control systems that balance safety, efficiency, and comfort, moving beyond simple imitation of human driving behaviors. The authors argue that while traditional car-following models rely on rule-based or supervised learning approaches to emulate human drivers, these methods often replicate suboptimal or unsafe behaviors. To resolve this, the study proposes a deep reinforcement learning (RL) model that optimizes driving performance directly through interaction with a simulation environment, rather than merely approximating human acceleration patterns. The methodology utilizes the Deep Deterministic Policy Gradient (DDPG) algorithm, chosen for its ability to handle continuous action spaces required for vehicle acceleration control. The model was trained using 1,341 car-following events extracted from the Next Generation Simulation (NGSIM) dataset, specifically from freeway data in Emeryville, California. The RL agent operates within a simulation environment where the state is defined by the following vehicle’s speed, spacing, and relative speed, while the action is the longitudinal acceleration. A critical component of the design is a custom reward function constructed from three features: Time to Collision (TTC) for safety, time headway for efficiency, and jerk (rate of change of acceleration) for comfort. The DDPG architecture employs actor and critic neural networks with three layers each, using experience replay and target networks to ensure stable learning. The model was trained for 60 episodes, with 70% of the data used for training and 30% for testing. The results demonstrate that the proposed model significantly outperforms human drivers observed in the NGSIM data across all three metrics. In terms of safety, the model reduced the percentage of dangerous minimum TTC values (< 5s) from 35% in human data to only 8%. Regarding efficiency, the model maintained time headways strictly within the optimal range of 1 to 2 seconds, whereas human drivers exhibited a wider distribution including both dangerously short and inefficiently long headways. For comfort, the model generated trajectories with lower jerk values, indicating smoother acceleration profiles compared to the erratic movements observed in empirical data. The model also showed strong generalization capabilities, performing similarly on both training and testing datasets. The significance of this work lies in its contribution to the development of safer and more efficient autonomous driving systems. By integrating safety, efficiency, and comfort into a unified reward structure, the approach allows autonomous vehicles to drive more optimally than human counterparts, mitigating risks associated with human error and variability. The findings suggest that reinforcement learning, when properly structured with domain-specific rewards, can produce driving behaviors that are not only human-like but superior in critical performance metrics, offering a robust framework for future autonomous vehicle control strategies.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	OpenAlex-citations	—	—	1	2026-06-24
archive	success	unpaywall	—	—	2	2026-06-26
extract	success	pdftotext	—	—	2	2026-06-26
clean	success	clean	—	—	1	2026-06-26
chunk	success	chunk	—	—	1	2026-06-26
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-26
enrich	success	semantic_scholar	—	—	1	2026-06-26
promote	success	—	—	—	1	2026-06-24
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-26
tag	success	vector_similarity	—	—	6	2026-06-26
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

following distance

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Theoretical Contribution: computational model