Data-Driven Traffic Simulation: A Comprehensive Review
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper presents a comprehensive review of data-driven microscopic traffic simulation, addressing the critical challenge of validating autonomous vehicle (AV) algorithms efficiently and safely. While on-road tests, track tests, and driving simulators are standard validation methods, they suffer from high costs, poor scalability, and an inability to replicate the unpredictable nature of real-world driving. Data-driven simulation offers a solution by enabling large-scale testing and realistic, reactive background traffic behavior. The authors identify a gap in existing literature, as no prior review sufficiently encompasses the scope and depth of data-driven methods, which have shifted focus from rule-based models due to the latter’s limited accuracy, poor generalization, and reliance on expert knowledge. The review is structured around the traffic simulation framework: input data, core modeling, and output evaluation. It details problem formulations using Markov Decision Processes (MDPs) and non-MDP approaches. The authors analyze input modalities (camera, LiDAR, radar, GNSS, HD maps) and compare datasets based on view types: Field of View (FOV), which provides immersive, egocentric perspectives, and Bird’s Eye View (BEV), which offers precise semantic localization. Context representation methods are categorized into rasterized, vectorized, and graph-based approaches, each with distinct trade-offs in computational efficiency and geometric precision. Agent modeling is examined through physics-based, statistical-based, and learning-based methods, with learning-based approaches leveraging neural networks for adaptability despite higher computational costs. Interaction modeling is divided into implicit methods, which use latent variables for adaptability, and explicit methods, which offer better interpretability through defined relationships like pass-and-yield. The paper evaluates prevalent learning models, including imitation learning, reinforcement learning, deep generative models, and deep learning, summarizing their advantages and limitations. It also reviews evaluation metrics essential for assessing simulation performance, categorized into realism (reconstruction ability), reactivity (safe response to dynamic environments), and diversity (coverage of agent policies). The authors distinguish between open-loop evaluation, where predictions do not drive the system forward, and closed-loop evaluation, which tests long-term stability. The significance of this work lies in its systematic organization of the rapidly evolving field of data-driven traffic simulation. By providing a critical analysis of current methodologies, datasets, and evaluation metrics, the paper establishes a foundational reference for researchers. It highlights existing challenges, such as the lack of transparency in learning-based models and the difficulty of simulating long-horizon interactions, and outlines future research directions. This review aims to guide the development of more realistic, reactive, and diverse simulation environments, thereby accelerating the safe deployment of autonomous vehicles.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | OpenAlex-citations | — | — | 1 | 2026-06-18 |
| archive | success | semantic_scholar | — | — | 6 | 2026-06-25 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-18 |
| chunk | success | chunk | — | — | 1 | 2026-06-18 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-18 |
| promote | success | — | — | — | 1 | 2026-06-18 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-18 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Methodological Resource: tool software