InVDriver: Intra-instance aware vectorized query-based autonomous driving transformer
DOI: 10.26599/jicv.2025.9210060
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper introduces InVDriver, a vectorized query-based transformer designed for end-to-end autonomous driving. The research addresses a critical limitation in existing vectorized frameworks: the assumption that points within structured elements (such as lane lines or trajectories) are independent and identically distributed. This oversight ignores inherent spatial correlations among intra-instance points, leading to geometrically inconsistent outputs like fragmented maps or oscillatory trajectories. By systematically modeling these intra-instance dependencies, InVDriver aims to enhance planning accuracy, trajectory smoothness, and overall system safety. The proposed system integrates perception, prediction, and planning modules using lightweight vectorized representations. Unlike traditional approaches that treat points as independent, InVDriver employs masked self-attention mechanisms across all core modules. These mechanisms restrict attention to interactions between points within the same instance, thereby suppressing irrelevant inter-instance noise and coordinating the refinement of structural elements. The perception module detects HD map elements, the prediction module forecasts agent trajectories, and the planning module generates ego-vehicle paths. Information flows seamlessly between modules via cross-attention layers, enabling holistic optimization. The model is trained end-to-end using a combination of L1-distance losses for pointwise accuracy and a vectorized constraint loss to preserve structural coherence. Experiments conducted on the nuScenes benchmark demonstrate that InVDriver achieves state-of-the-art performance. It surpasses prior methods, including the vision-based VAD, in both accuracy and safety metrics. Specifically, InVDriver reduces average displacement error by 36% compared to VAD, with errors of 0.26 m at 1 s, 0.46 m at 2 s, and 0.78 m at 3 s. Additionally, it lowers collision rates by 26%. Despite these performance gains, the system maintains high computational efficiency, operating at 15.3 frames per second, which is comparable to other camera-only methods. Ablation studies confirm that the intra-instance attention mechanism is responsible for the improved temporal consistency and geometric precision, particularly evident in longer-horizon predictions. The significance of this work lies in its validation of intra-instance modeling as a critical factor for robust autonomous driving systems. By addressing the geometric inconsistencies inherent in previous vectorized approaches, InVDriver provides a more reliable framework for handling complex traffic scenarios. The results suggest that explicitly encoding spatial dependencies within structured elements significantly improves the stability and interpretability of driving decisions. This approach offers a practical pathway for deploying efficient, safe, and accurate end-to-end autonomous driving systems, balancing high performance with computational feasibility.
Key finding
InVDriver achieves state-of-the-art performance on the nuScenes benchmark by reducing displacement errors and collision rates through the explicit modeling of intra-instance spatial dependencies using masked self-attention.
Methodology
simulation_modeling
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via author_sweep_intake on 2026-05-28.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | author_sweep | — | — | 2 | 2026-05-28 |
| archive | success | canonical_url | — | — | 5 | 2026-06-06 |
| extract | success | cached | — | — | 3 | 2026-06-10 |
| clean | success | clean | — | — | 1 | 2026-06-04 |
| chunk | success | chunk | — | — | 1 | 2026-06-04 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-04 |
| enrich | success | — | — | — | 1 | 2026-05-28 |
| promote | success | — | — | — | 1 | 2026-06-04 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 2 | 2026-06-10 |
| tag | success | vector_similarity | — | — | 15 | 2026-06-11 |
| verify | success | — | — | — | 2 | 2026-06-10 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.