Scalability in Perception for Autonomous Driving: Waymo Open Dataset

Sun, Pei; Kretzschmar, Henrik; Dotiwalla, Xerxes; Chouard, Aurelien; Patnaik, Vijaysai; Tsui, Paul; Guo, James; Zhou, Yin; Chai, Yuning; Caine, Benjamin; Vasudevan, Vijay; Han, Wei; Ngiam, Jiquan; Zhao, Hang; Timofeev, Aleksei; Ettinger, Scott; Krivokon, Maxim; Gao, Amy; Joshi, Aditya; Zhang, Yu; Shlens, Jonathon; Chen, Zhifeng; Anguelov, Dragomir · 2020 · OpenAlex-citations

DOI: 10.1109/cvpr42600.2020.00252

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper introduces the Waymo Open Dataset, a large-scale, high-quality, and diverse multimodal dataset designed to advance research in autonomous driving perception. The authors address the limitation of existing datasets, which often lack sufficient scale and geographical variation, hindering the generalization of autonomous driving models across different operating regions. To align research with real-world challenges, the dataset provides synchronized, calibrated data from multiple high-resolution cameras and high-quality LiDAR scanners, recorded across urban and suburban environments in San Francisco, Phoenix, and Mountain View. The dataset comprises 1,150 scenes, each spanning 20 seconds, captured using an industrial-strength sensor suite consisting of five LiDAR sensors and five cameras. The data is encoded as range images for LiDAR, providing range, intensity, elongation, and vehicle pose for each pixel, alongside JPEG-compressed camera images with rolling shutter timing information. The authors provide exhaustive manual annotations for vehicles, pedestrians, cyclists, and signs, resulting in approximately 12 million 3D LiDAR bounding boxes and 12 million 2D camera bounding boxes, all linked by consistent tracking identifiers. The dataset covers a geographical area of 76 km², which is 15 times more diverse than previous comparable datasets. It is split into training (1,000 scenes), validation (202 scenes), and testing (150 scenes) sets, with the test set reserved for a geographical holdout area to evaluate generalization. The paper establishes benchmark results for 2D and 3D object detection and tracking tasks. For 3D detection, the authors propose a new metric, APH (Average Precision with Heading), which incorporates heading accuracy into the standard AP calculation, addressing the critical need for accurate orientation prediction in autonomous driving. For tracking, they utilize standard Multiple Object Tracking (MOTA and MOTP) metrics. Baseline experiments using methods like PointPillars for 3D detection demonstrate the dataset's utility. The authors also highlight the dataset's potential for domain adaptation research, noting that the pronounced differences between the captured geographies create a significant domain gap. Additionally, the precise synchronization between sensors, with errors bounded within milliseconds, facilitates cross-domain learning and sensor fusion studies. The significance of this work lies in providing the research community with the largest and most diverse autonomous driving dataset to date, enabling rigorous evaluation of model generalization and scalability. By releasing high-quality ground truth and standardized evaluation metrics, the dataset accelerates progress in machine perception tasks such as object detection, tracking, and sensor fusion. The inclusion of geographical holdout sets specifically addresses the challenge of generalizing to unseen environments, a crucial factor for the viability of autonomous driving technology.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	OpenAlex-citations	—	—	1	2026-06-25
archive	success	semantic_scholar	—	—	6	2026-06-26
extract	success	cached	—	—	2	2026-06-26
clean	success	clean	—	—	1	2026-06-25
chunk	success	chunk	—	—	1	2026-06-25
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-25
promote	success	—	—	—	1	2026-06-25
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-26
tag	success	vector_similarity	—	—	6	2026-06-25
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

distraction detection algorithms

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Methodological Resource: dataset resource, tool software