Test Your Self-Driving Algorithm: An Overview of Publicly Available Driving Datasets and Virtual Testing Environments
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the critical need for rigorous testing strategies for autonomous driving systems, particularly those targeting SAE Level 5 functionality. As algorithms for self-driving vehicles become increasingly complex, thorough evaluation of individual software units and the complete data processing chain is mandatory. The authors identify a gap in existing literature regarding a comprehensive overview of resources for both open-loop testing (using recorded sensor data) and closed-loop testing (using virtual environments). Motivated by the resource-intensive nature of collecting new data and the limitations of open-loop testing alone, the study aims to guide researchers and developers in selecting appropriate publicly available datasets and simulation tools. The methodology involves a systematic survey of publicly accessible driving datasets and virtual testing environments. To identify datasets, the authors employed a four-step process: initial Google searches, forward snowballing through dataset web pages, collection of accompanying scientific publications, and backward snowballing through publication references. Inclusion criteria required data collected from on-board sensors on public roads, containing camera, LiDAR, or radar data, and offering full or partial open access. Similarly, virtual testing environments were identified via Google and YouTube searches, publication collection, and snowballing among project websites, focusing on open-source solutions relevant to autonomous driving simulation. The study presents an updated and expanded overview of 37 publicly available driving datasets and 22 virtual testing environments. The dataset survey extends previous work by including ten recently published datasets, such as Apollo, nuScenes, and the Oxford RobotCar dataset. The authors provide detailed descriptions of each resource, including providers, sensor setups, and specific highlights, such as the KITTI Vision Benchmark Suite’s prestige in stereo vision evaluation or the nuScenes dataset’s comprehensive sensor suite coverage. The analysis reveals that most datasets were released after 2009, with a significant increase in collections since 2016. Geographically, data collection is concentrated in Europe and the US, with Germany being the most active region, though the authors note a lack of diversity in other global regions. The significance of this work lies in its provision of a structured guide for the autonomous driving research community. By cataloging these resources, the paper facilitates the selection of appropriate tools for specific algorithmic evaluations, supporting both perception approach experiments and end-to-end simulations. The authors conclude that while recorded data offers high fidelity for open-loop testing, virtual environments are essential for scalable, closed-loop validation. They emphasize the importance of community contributions through open-source datasets and simulations, urging future data collection to expand geographically to ensure algorithms are robust across diverse traffic conditions and environments. This comprehensive overview serves as a foundational reference for developing and validating self-driving vehicle algorithms.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | OpenAlex-citations | — | — | 1 | 2026-06-25 |
| archive | success | unpaywall | — | — | 2 | 2026-06-26 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-25 |
| chunk | success | chunk | — | — | 1 | 2026-06-25 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-25 |
| promote | success | — | — | — | 1 | 2026-06-25 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-25 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Methodological Resource: tool software, dataset resource