Improved object detection method for autonomous driving based on DETR

Zhao, Huaqi; Zhang, Songnan; Peng, Xiang; Lu, Zhengguang; Li, Guojing · 2025 · DOAJ

DOI: 10.3389/fnbot.2024.1484276

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the limitations of current object detection methods in autonomous driving, specifically focusing on the Detection Transformer (DETR) architecture. The authors identify three primary challenges in existing DETR-like models: inadequate performance in detecting multi-scale objects and precise localization, high computational costs associated with global attention mechanisms in the encoder, and slow convergence due to manually fixed loss function weights. To resolve these issues, the study proposes an improved object detection method that integrates multi-scale feature extraction, an efficient transformer encoder, and dynamic hyperparameter tuning. The proposed method consists of three key components. First, a multi-scale feature and location information extraction module is introduced to enhance the backbone network. This module utilizes residual partition units to extract features at varying scales and incorporates a coordinate attention mechanism to capture precise positional information, addressing the difficulty of detecting small, distant targets alongside large, nearby ones. Second, the authors develop a transformer encoder based on a group axial attention mechanism. This design splits attention computation into horizontal and vertical groups, allowing for efficient parallel processing that balances local and global information while significantly reducing computational overhead compared to standard multi-head attention. Third, a novel dynamic hyperparameter tuning method based on Pareto efficiency is implemented. This approach automatically adjusts the weights of classification and regression loss functions during training, overcoming the inefficiencies of manual weight setting and improving model convergence. Experimental results demonstrate that the proposed method outperforms existing techniques across multiple benchmarks. On the COCO, PASCAL VOC, and KITTI datasets, the method achieved improvements in average precision of 3.3%, 4.5%, and 3%, respectively. Additionally, the optimized encoder structure resulted in an 84% increase in frames per second (FPS), highlighting significant gains in inference speed. These findings indicate that the integration of multi-scale feature extraction, efficient axial attention, and dynamic loss weighting effectively enhances both the accuracy and real-time performance of object detection systems. The significance of this work lies in its contribution to the development of robust, efficient perception systems for autonomous driving. By addressing the specific bottlenecks of DETR architectures—namely multi-scale detection, computational complexity, and training stability—the proposed method offers a more viable solution for real-world deployment. The improvements in both precision and speed suggest that this approach can better handle complex driving scenarios involving occlusions, varying object sizes, and dynamic backgrounds, thereby advancing the reliability of autonomous vehicle navigation.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	DOAJ	—	—	1	2026-06-17
archive	success	unpaywall	—	—	1	2026-06-25
extract	success	cached	—	—	2	2026-06-25
clean	success	clean	—	—	1	2026-06-18
chunk	success	chunk	—	—	1	2026-06-18
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-18
promote	success	—	—	—	1	2026-06-17
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-25
tag	success	vector_similarity	—	—	6	2026-06-18
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-25; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

distraction detection algorithms