Exploring augmentation strategies in mixed reality for autonomous driving with depth cameras

Argui, Imane; Gueriau, Maxime; Ainouz, Samia · 2024 · Crossref

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of limited diverse real-world data and the "reality gap" in training autonomous driving systems. While simulation offers controlled environments, it lacks the unpredictability of real-world conditions, and pure real-world testing is often unsafe or insufficient for rare scenarios. The authors propose a mixed-reality augmentation strategy using depth cameras to bridge this gap by seamlessly integrating virtual objects into real-world depth maps. This approach aims to expand training datasets and enable safe testing of dangerous scenarios while maintaining the fidelity required for effective perception models. The methodology involves a two-step process: virtual image processing and an augmentation strategy. Virtual RGB and depth images were generated using the Gazebo simulator, with backgrounds removed via a visual plugin to isolate virtual elements. To align the dynamic range of virtual depth maps with real-world data, a logarithmic transformation and normalization were applied. The core augmentation strategy compares pixel-by-pixel depth values between real and virtual maps. Using a union of conditional intersections, the algorithm replaces real depth values with virtual ones only where virtual objects are closer to the camera, thereby handling occlusions and ensuring accurate spatial placement. This process was tested offline on the KITTI dataset, which provided 7,481 real RGB images. Depth maps for these real images were generated using the MIDAS monocular depth estimation algorithm. Virtual images included 80 samples across four object classes: pedestrians, cars, trucks, and motorcycles. The study evaluated the fusion quality using quantitative metrics—Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Pearson correlation—and qualitative object detection using a pre-trained Faster RCNN model. Quantitative results on the full dataset showed a mean PSNR of 23.88, a mean SSIM of 0.889, and a mean correlation of 0.965, indicating moderate image quality with high structural integrity and strong preservation of intensity patterns. Object detection results demonstrated that the augmented images maintained detectability comparable to real images. The Faster RCNN achieved a mean Average Precision (mAP) of 0.8134 on augmented images using real ground truth, closely matching the 0.830 mAP on real images. When tested with virtual ground truth, the augmented images achieved an mAP of 0.869, outperforming the virtual-only baseline of 0.75. The findings confirm that the proposed augmentation strategy effectively generates large, mixed-reality datasets without significantly compromising image quality or object detection performance. The method successfully integrates virtual elements into real scenes while accounting for occlusions. However, the authors note limitations regarding the disproportionate size of virtual objects and texture discrepancies due to camera orientation differences, suggesting future improvements in alignment and scaling. This work establishes a foundation for scalable, safe training of autonomous driving models in complex scenarios that are difficult to simulate or capture in reality.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	Crossref	—	—	1	2026-06-20
archive	success	unpaywall	—	—	2	2026-06-26
extract	success	cached	—	—	2	2026-06-26
clean	success	clean	—	—	1	2026-06-20
chunk	success	chunk	—	—	1	2026-06-20
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-20
promote	success	—	—	—	1	2026-06-20
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-26
tag	success	vector_similarity	—	—	6	2026-06-20
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

hud ar windshield