DivNEDS: Diverse Naturalistic Edge Driving Scene Dataset for Autonomous Vehicle Scene Understanding
DOI: 10.1109/access.2024.3394530
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the critical challenge of interpretable scene understanding for Autonomous Vehicles (AVs), particularly in out-of-distribution and edge scenarios. The authors note that 84% of AV disengagements in real-world tests are attributed to scene understanding errors, such as failing to interpret unpredictable maneuvers, erratic pedestrian behavior, or complex interactions. Existing datasets often suffer from limited geographic scope, homogeneous environments, and annotation strategies that include irrelevant background information, leading to poor generalization. To mitigate these issues, the study introduces DivNEDS (Diverse Naturalistic Edge Driving Scene Dataset) and a novel annotation strategy to enhance AV safety and adaptability. The DivNEDS dataset comprises 11,084 edge scenes and 203,000 descriptive captions sourced from 12 distinct global locations, including cities in the US, UK, Australia, India, Indonesia, South Africa, and Nigeria. The data captures diverse conditions, including varying weather (snow, fog, rain), lighting (day and night), and road types (rural and urban). The authors propose an "embedded hierarchical dense captioning" strategy, which utilizes three levels of annotation: low-level (objects with attributes), middle-level (relationships and actions), and high-level (scene context). This nested approach aims to eliminate irrelevant background features and enable few-shot learning. Additionally, the paper introduces DivNET, a Generative Region-to-Text Transformer designed to process these hierarchical annotations. The dataset was curated through a combination of original captures and publicly available images, annotated by four transportation engineering experts over 12 months. The study establishes a new benchmark for AV scene understanding models using dense captioned data. The embedded hierarchical dense captioning strategy achieved a baseline performance of 60.3 mean Average Precision (mAP). The authors demonstrate that this hierarchical approach creates a more effective feature space compared to traditional single-scene bounding boxes, which often dilute features with irrelevant background information. By embedding low and middle-level information into high-level scene captions, the method allows models to learn associative relationships between receptive fields at different annotation levels. The dataset retains an authentic distribution of real-world edge scenarios rather than artificially balancing classes, ensuring that the model’s training reflects actual encounter frequencies. The significance of this work lies in its contribution to robust, generalizable AV perception systems. By providing a diverse, globally sourced dataset focused on complex edge cases, DivNEDS addresses the limitations of prior datasets that were restricted to specific cities or idealistic conditions. The embedded hierarchical captioning method offers a pathway to improve interpretability and reduce overfitting, enabling AVs to better comprehend dynamic environments. This advancement supports the development of safer autonomous driving systems capable of handling the unpredictable nature of real-world traffic, thereby facilitating broader adoption and meeting stringent safety requirements.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | Crossref | — | — | 1 | 2026-06-25 |
| archive | success | unpaywall | — | — | 2 | 2026-06-26 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-26 |
| chunk | success | chunk | — | — | 1 | 2026-06-26 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-26 |
| enrich | success | openalex | — | — | 1 | 2026-06-26 |
| promote | success | — | — | — | 1 | 2026-06-25 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-26 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Methodological Resource: dataset resource, tool software