Improved object detection method for autonomous driving based on DETR
DOI: 10.3389/fnbot.2024.1484276
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the limitations of current object detection methods in autonomous driving, specifically focusing on the Detection Transformer (DETR) architecture. The authors identify three primary challenges in existing DETR-like models: inadequate performance in detecting multi-scale objects and precise localization, high computational costs associated with global attention mechanisms in the encoder, and slow convergence due to manually fixed loss function weights. To resolve these issues, the study proposes an improved object detection method that integrates multi-scale feature extraction, an efficient transformer encoder, and dynamic hyperparameter tuning. The proposed method consists of three key components. First, a multi-scale feature and location information extraction module is introduced to enhance the backbone network. This module utilizes residual partition units to extract features at varying scales and incorporates a coordinate attention mechanism to capture precise positional information, addressing the difficulty of detecting small, distant targets alongside large, nearby ones. Second, the authors develop a transformer encoder based on a group axial attention mechanism. This design splits attention computation into horizontal and vertical groups, allowing for efficient parallel processing that balances local and global information while significantly reducing computational overhead compared to standard multi-head attention. Third, a novel dynamic hyperparameter tuning method based on Pareto efficiency is implemented. This approach automatically adjusts the weights of classification and regression loss functions during training, overcoming the inefficiencies of manual weight setting and improving model convergence. Experimental results demonstrate that the proposed method outperforms existing techniques across multiple benchmarks. On the COCO, PASCAL VOC, and KITTI datasets, the method achieved improvements in average precision of 3.3%, 4.5%, and 3%, respectively. Additionally, the optimized encoder structure resulted in an 84% increase in frames per second (FPS), highlighting significant gains in inference speed. These findings indicate that the integration of multi-scale feature extraction, efficient axial attention, and dynamic loss weighting effectively enhances both the accuracy and real-time performance of object detection systems. The significance of this work lies in its contribution to the development of robust, efficient perception systems for autonomous driving. By addressing the specific bottlenecks of DETR architectures—namely multi-scale detection, computational complexity, and training stability—the proposed method offers a more viable solution for real-world deployment. The improvements in both precision and speed suggest that this approach can better handle complex driving scenarios involving occlusions, varying object sizes, and dynamic backgrounds, thereby advancing the reliability of autonomous vehicle navigation.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | DOAJ | — | — | 1 | 2026-06-17 |
| archive | success | unpaywall | — | — | 1 | 2026-06-25 |
| extract | success | cached | — | — | 2 | 2026-06-25 |
| clean | success | clean | — | — | 1 | 2026-06-18 |
| chunk | success | chunk | — | — | 1 | 2026-06-18 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-18 |
| promote | success | — | — | — | 1 | 2026-06-17 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-25 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-18 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-25; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.