Evaluation of MPC-based Imitation Learning for Human-like Autonomous Driving

Acerbo, Flavia Sofia; Swevers, Jan; Tuytelaars, Tinne; Son, Tong Duy · 2023 · Crossref

DOI: 10.1016/j.ifacol.2023.10.1257

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of achieving human-like autonomous driving by combining imitation learning (IL) with differentiable model predictive control (MPC). While data-driven approaches like behavioral cloning show promise, they often suffer from covariate shift and lack guarantees regarding safety, comfort, and stability. The authors propose a hierarchical framework that integrates a learning-based policy with an MPC controller to better imitate dynamic human behaviors while maintaining robust closed-loop performance. The study specifically investigates the impact of differentiable MPC policies, hierarchical decomposition, and closed-loop training strategies on learning robustness and imitation quality. The methodology involves a lane-keeping control system evaluated using human demonstrations collected from a fixed-base driving simulator. The vehicle dynamics are modeled using a bicycle model in Frenet coordinates, and the MPC is formulated as a differentiable optimization problem. The authors compare three policy types: a pure neural network (NN), a pure MPC, and a hierarchical NN-MPC combination. They evaluate these policies using open-loop behavioral cloning (BC), supervised learning (SL), and a novel closed-loop state cloning (SC) algorithm. The SC algorithm approximates the policy gradient through time by using the MPC’s state-space model to backpropagate through the environment dynamics, thereby mitigating covariate shift. Performance is measured using metrics for imitation (open-loop error and closed-loop likelihood), safety (lane boundary violations), comfort (lateral jerk), and human-like characteristics (steering reversal rate and lateral deviation). The results demonstrate that pure NN policies trained with open-loop BC suffer from severe covariate shift, driving off-lane in closed-loop simulations. In contrast, the differentiable MPC policy trained with BC exhibits stable closed-loop behavior and captures human preferences, such as lateral positioning, despite higher open-loop loss. The hierarchical NN-MPC policy further improves imitation by learning dynamic setpoints; however, it remains susceptible to causal confusion when the prediction horizon is reduced, leading to performance degradation. The introduction of closed-loop state cloning significantly resolves these issues. For the hierarchical policy with a short horizon, SC improved closed-loop likelihood by 94% compared to supervised learning alone, eliminated lane violations, and enhanced comfort metrics. The best-performing configuration, a hierarchical NN-MPC controlling steering rate with SC, achieved high imitation fidelity, low lateral jerk, and steering reversal rates comparable to human drivers. The significance of this work lies in demonstrating that augmenting open-loop behavioral cloning with closed-loop training via differentiable MPC yields more robust and human-like policies. The study highlights that while hierarchical frameworks can capture complex human dynamics, they require closed-loop training to avoid causal confusion and covariate shift. By leveraging the differentiability of MPC, the authors provide a method to approximate policy gradients efficiently, enabling the learning of safe, comfortable, and human-like driving behaviors that are difficult to achieve with pure learning-based or pure model-based approaches.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	Crossref	—	—	1	2026-06-25
archive	success	openalex	—	—	5	2026-06-26
extract	success	cached	—	—	2	2026-06-26
clean	success	clean	—	—	1	2026-06-26
chunk	success	chunk	—	—	1	2026-06-26
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-26
enrich	success	openalex	—	—	1	2026-06-26
promote	success	—	—	—	1	2026-06-25
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-26
tag	success	vector_similarity	—	—	6	2026-06-26
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

lane positioning

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Theoretical Contribution: computational model