A Leading Cruise Controller for Autonomous Vehicles in Mixed Autonomy Based on Preference-Based Reinforcement Learning

Wen, Xiao Luke; Jian, Sisi; He, Dengbo · 2024 · IEEE IV

DOI: 10.1109/iv55156.2024.10588421

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the limitations of existing autonomous vehicle (AV) car-following controllers, which often prioritize the AV’s utility while neglecting the safety and efficiency of surrounding human-driven vehicles (HDVs). This "self-centered" approach can lead to aggressive driving behaviors and traffic instability in mixed autonomy environments. The study proposes a "leading cruise controller" for AVs that considers the behaviors of both the lead HDV and the following HDV (FHDV) to enhance overall traffic flow performance, specifically regarding safety, efficiency, and string stability. The methodology employs a three-vehicle car-following scenario modeled as a Markov Decision Process. The AV’s longitudinal acceleration is controlled using a Preference-based Soft Actor-Critic (PbSAC) algorithm. To simulate realistic mixed traffic, the study uses real-world data from the Waymo Open Dataset. Human driving behaviors for the FHDV are approximated using Inverse Reinforcement Learning (IRL). The PbSAC algorithm incorporates a reward function with four terms: control efficiency and string stability for both the AV and the FHDV. A key innovation is a preference-adjusting module that dynamically updates the weights of these reward terms based on expert evaluations using a Bayesian learning framework, avoiding the need for manual weight tuning. Experimental results compare the proposed PbSAC controller (Scenario 1) against three baselines: PbSAC ignoring FHDV benefits (Scenario 2), standard SAC with manual weights (Scenario 3), and Model Predictive Control (Scenario 4). The proposed controller significantly improved safety for the FHDV, as evidenced by lower critical Time-to-Collision (TTC) values compared to other scenarios. While average speeds remained similar across all scenarios, indicating comparable efficiency, the proposed method achieved superior string stability. It recorded the lowest average standard deviation of speed for both the AV and the FHDV, demonstrating its ability to dampen traffic oscillations. Additionally, the preference-adjusting module proved effective, with Scenario 2 outperforming Scenario 3 in stability, highlighting the benefit of adaptive weight adjustment over fixed parameters. The significance of this work lies in its demonstration that AV controllers can act as virtual regulators to stabilize mixed traffic flows by considering downstream vehicle dynamics. By integrating human preferences into the reinforcement learning reward structure, the PbSAC algorithm provides a flexible and adaptive solution for multi-objective control. The findings suggest that accounting for the utility of following HDVs reduces crash risks and improves traffic smoothness without sacrificing efficiency, offering a robust approach for AV deployment in transitional mixed-autonomy environments.

Key finding

The proposed preference-based soft actor-critic controller improves safety and string stability for both the autonomous vehicle and the following human-driven vehicle by accounting for the benefits of the entire three-vehicle platoon.

Methodology

simulation_modeling

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	—	—	—	1	2026-05-28
archive	success	canonical_url	—	—	1	2026-06-06
extract	success	cached	—	—	3	2026-06-10
clean	success	clean	—	—	1	2026-06-07
chunk	success	chunk	—	—	1	2026-06-07
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-07
enrich	success	semantic_scholar	—	—	4	2026-06-15
promote	success	—	—	—	1	2026-06-04
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	2	2026-06-10
tag	success	vector_similarity	—	—	15	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

following distance

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Theoretical Contribution: computational model