Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the significant performance gap between LiDAR-based and image-based 3D object detection for autonomous driving. While LiDAR provides accurate 3D point clouds, it is expensive, whereas stereo and monocular cameras are affordable but have historically yielded drastically lower detection accuracies. The authors challenge the prevailing assumption that this gap is caused by the poor precision of image-based depth estimation. Instead, they argue that the disparity stems from the representation of depth data: existing methods incorporate depth as additional channels in 2D image pipelines, which distorts spatial relationships and makes distant objects harder to detect. To bridge this gap, the authors propose a "pseudo-LiDAR" representation. The method involves a two-step pipeline: first, estimating a dense depth map from stereo or monocular imagery using state-of-the-art algorithms (e.g., PSMNet or DORN); second, back-projecting these pixel depths into 3D coordinates to generate a sparse 3D point cloud that mimics LiDAR signals. This pseudo-LiDAR data is then processed using existing LiDAR-based detection architectures, such as Frustum PointNet and AVOD, which operate on point clouds or bird’s-eye view projections rather than 2D image channels. The authors demonstrate that this representation preserves physical object sizes regardless of distance and ensures that convolutional operations group physically adjacent points, unlike 2D convolutions on depth maps which mix distant spatial regions. Experiments on the KITTI benchmark reveal substantial improvements in detection accuracy. Using pseudo-LiDAR with stereo depth estimation, the approach achieves a 3D average precision (AP) of 45.3% for moderately hard car instances at an IoU of 0.7, compared to the previous state-of-the-art image-based result of 22%. This represents a near-tripling of performance and significantly narrows the gap with LiDAR-based systems. The study also shows that pseudo-LiDAR is robust across different depth estimation algorithms and detection architectures. Notably, the performance gain is attributed primarily to the representation change rather than improvements in depth estimation accuracy, as even less accurate disparity estimators yielded better detection results when converted to pseudo-LiDAR than when used in traditional 2D pipelines. The significance of this work lies in demonstrating that high-quality 3D object detection can be achieved using affordable camera systems by aligning data representations with the inductive biases of 3D detection networks. By converting visual depth into a LiDAR-like format, the method enables the use of powerful LiDAR-specific algorithms on camera data. This finding suggests that future research should focus on representation engineering rather than solely improving depth estimation precision, potentially enabling safer and more cost-effective autonomous driving systems that rely on vision sensors.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | OpenAlex-citations | — | — | 1 | 2026-06-18 |
| archive | success | semantic_scholar | — | — | 6 | 2026-06-25 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-18 |
| chunk | success | chunk | — | — | 1 | 2026-06-18 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-18 |
| promote | success | — | — | — | 1 | 2026-06-18 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-18 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.