PIXOR: Real-time 3D Object Detection from Point Clouds
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the challenge of real-time 3D object detection from LIDAR point clouds, a critical requirement for autonomous driving safety. Existing methods struggle with the high computational cost of processing unstructured, sparse 3D data. Approaches using 3D voxel grids are inefficient due to sparsity, while 2D projection methods often suffer from information loss or require multi-modal fusion that hinders real-time performance. The authors propose PIXOR (ORiented 3D object detection from PIXel-wise neural network predictions), a single-stage, proposal-free detector designed to balance high accuracy with real-time efficiency. PIXOR utilizes a Bird’s Eye View (BEV) representation of the LIDAR point cloud, projecting 3D points onto a 2D grid. This representation preserves metric space and object priors while enabling efficient 2D convolution. The input features include occupancy and intensity channels. The network architecture consists of a backbone for feature extraction and a header for dense predictions. The backbone employs a modified residual network with a top-down path to up-sample features, ensuring fine details are retained for small objects. The header outputs pixel-wise predictions for object classification and geometry (heading angle, center offset, width, and length) without using pre-defined anchors. The model is trained using a combination of focal loss for classification to handle class imbalance and smooth L1 loss for regression. Experiments on the KITTI BEV object detection benchmark and the large-scale TOR4D dataset demonstrate that PIXOR achieves state-of-the-art performance. On KITTI, PIXOR attains an Average Precision (AP) of 75.74% at 0.7 IoU, surpassing previous best methods like MV3D. Crucially, PIXOR operates at over 28 frames per second (FPS), significantly faster than competing detectors which typically run at 1–2 FPS or require additional camera inputs. Ablation studies confirm that the BEV representation, the specific backbone architecture with up-sampling, and the use of focal loss are key contributors to its performance. The detector also shows strong generalization on the TOR4D dataset, maintaining high accuracy across varying distances. The significance of this work lies in demonstrating that accurate 3D object detection can be achieved in real-time using LIDAR data alone, without the computational overhead of 3D convolutions or the latency of two-stage proposal-based detectors. By simplifying the detection pipeline to a single-stage dense prediction on a BEV representation, PIXOR provides a practical solution for safety-critical autonomous driving applications where both speed and precision are paramount.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | OpenAlex-citations | — | — | 1 | 2026-06-18 |
| archive | success | semantic_scholar | — | — | 6 | 2026-06-25 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-18 |
| chunk | success | chunk | — | — | 1 | 2026-06-18 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-18 |
| promote | success | — | — | — | 1 | 2026-06-18 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-18 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.