Multi-view 3D Object Detection Network for Autonomous Driving

Chen, Xiaozhi; Ma, Huimin; Wan, Ji; Li, Bo; Xia, Tian · 2017 · OpenAlex-citations

DOI: 10.1109/cvpr.2017.691

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper introduces the Multi-View 3D Object Detection Network (MV3D), a sensory-fusion framework designed to improve the accuracy of 3D object detection for autonomous driving. The research addresses the challenge of combining LIDAR point clouds, which provide precise depth information, with RGB images, which offer rich semantic details. While existing methods often rely on computationally expensive 3D voxel grids or fail to effectively fuse multimodal data for 3D localization, MV3D proposes a compact multi-view representation and a deep fusion scheme to predict oriented 3D bounding boxes with high precision. The MV3D architecture consists of two primary subnetworks: a 3D Proposal Network and a Region-based Fusion Network. The proposal network utilizes a bird’s eye view (BEV) representation of the LIDAR point cloud, encoded with height, intensity, and density features, to generate efficient 3D candidate boxes. This BEV approach is preferred over front-view projections because it preserves object physical sizes and minimizes occlusion. The region-based fusion network then projects these 3D proposals onto three views: the BEV, a front-view map (encoded with height, distance, and intensity), and the RGB image. Features from these views are extracted via ROI pooling and combined using a deep fusion strategy that enables interactions between intermediate layers of different modalities, rather than simple early or late fusion. The network is regularized using drop-path training and auxiliary losses to ensure robust feature learning from each view. Experiments conducted on the KITTI benchmark demonstrate that MV3D significantly outperforms state-of-the-art methods. The proposed 3D proposal network achieves a 99.1% recall at an Intersection-over-Union (IoU) threshold of 0.25 using only 300 proposals, surpassing previous methods like 3DOP and Mono3D. In terms of detection accuracy, the LIDAR-only variant of MV3D achieves approximately 25% higher Average Precision (AP) for 3D localization and 30% higher AP for 3D detection compared to existing LIDAR-based approaches. Furthermore, it improves 2D detection AP by 10.3% on the hard test set. When RGB images are fused with LIDAR data, performance improves further, achieving 89.56% AP for 3D detection on the moderate difficulty level. The significance of this work lies in its demonstration that deep, region-based feature fusion across multiple sensor views yields superior performance compared to traditional fusion schemes. By encoding sparse LIDAR data into compact 2D maps and leveraging a hierarchical fusion network, MV3D provides a computationally efficient and highly accurate solution for 3D object detection. This approach highlights the importance of integrating geometric precision from LIDAR with semantic richness from cameras, offering a robust framework for the visual perception systems required in autonomous driving applications.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	OpenAlex-citations	—	—	1	2026-06-18
archive	success	semantic_scholar	—	—	6	2026-06-25
extract	success	cached	—	—	2	2026-06-26
clean	success	clean	—	—	1	2026-06-18
chunk	success	chunk	—	—	1	2026-06-18
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-18
promote	success	—	—	—	1	2026-06-18
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-26
tag	success	vector_similarity	—	—	6	2026-06-18
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

distraction detection algorithms