Visual Object Recognition with 3D-Aware Features in KITTI Urban Scenes
DOI: 10.3390/s150409228
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the challenge of detecting and estimating the orientation of road participants—specifically cars, pedestrians, and cyclists—in complex, naturalistic urban environments. The authors argue that while LiDAR provides accurate 3D data, it is expensive, whereas vision sensors offer semantic richness but lack inherent depth information. To bridge this gap, the study proposes using cost-effective stereo cameras to capture both appearance (color) and depth cues (disparity). The primary motivation is to improve object recognition performance in autonomous driving systems by leveraging 3D-aware features within the challenging KITTI urban scene dataset, which includes issues like occlusion, varying illumination, and diverse object viewpoints. The methodology extends the Discriminatively-Trained Part-based Model (DPM), a widely used object detector, to incorporate 2.5D data. Since the KITTI dataset does not provide disparity maps, the authors compute them from stereo image pairs using the Semi-Global Matching (SGM) algorithm. They introduce several "3D-aware features" derived from Histograms of Oriented Gradients (HOG) computed on both color and disparity images. These include concatenating color and disparity HOGs, adding disparity statistics, and intersecting the two feature sets. The authors specifically analyze contrast-sensitive versus contrast-insensitive histograms on disparity data. The DPM training pipeline is also modified to include adaptive part sizes and fixed viewpoint variables to better handle intra-class variability. The system is evaluated using five-fold cross-validation on the KITTI training set, with metrics including Average Precision (AP), Average Orientation Similarity (AOS), and Log-Average Miss Rate (LAMR). The experimental results demonstrate that incorporating disparity information significantly enhances detection performance. Among the proposed features, the contrast-sensitive disparity features (C8B1) yielded the best overall results, outperforming the baseline DPM model that used only color features. Specifically, the C8B1 features achieved higher AP and AOS scores and lower LAMR values across all difficulty levels (easy, moderate, and hard) for car detection. The study confirms that disparity gradients provide complementary information to color gradients, leading to richer object models. The approach was ranked on the KITTI website, marking it as the first work to report results using stereo data for the KITTI object challenge. The findings indicate that 3D-aware features derived from stereo vision can effectively capture the appearance and depth peculiarities of objects, resulting in increased detection ratios for cars and cyclists compared to monocular baselines. The significance of this work lies in its demonstration that affordable stereo vision systems can provide sufficient 3D cues to improve autonomous vehicle perception without relying on expensive LiDAR sensors. By successfully integrating depth information into the DPM framework, the authors provide a scalable solution for robust object detection and orientation estimation in dynamic urban scenarios. This contributes to the broader field of computer vision by validating the utility of 2.5D data for semantic scene understanding and offering a practical pathway for deploying advanced driver assistance systems that rely on visual sensors.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | Crossref | — | — | 1 | 2026-06-25 |
| archive | success | openalex | — | — | 5 | 2026-06-26 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-25 |
| chunk | success | chunk | — | — | 1 | 2026-06-25 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-25 |
| promote | success | — | — | — | 1 | 2026-06-25 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-25 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.