Assessment of Contextual Complexity and Risk Using Unsupervised Clustering Approaches with Dynamic Traffic Condition Data Obtained from Autonomous Vehicles
archive: archived pipeline: cataloged verified
Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)
Summary
This research addresses the limitations of traditional road safety assessments, which rely on static metrics like Annual Average Daily Traffic (AADT) and historical crash data. These methods fail to capture the fast-changing dynamics of the driving environment, such as interactions with vehicles, pedestrians, and bicyclists, which significantly influence contextual complexity and accident risk. To bridge this gap, the study proposes a method to quantify the contextual complexity of roadway environments using dynamic traffic condition data obtained from autonomous vehicles (AVs). The goal is to develop a Contextual Complexity Factor (CCF) model that incorporates real-time dynamic interaction metrics, thereby improving risk estimation, safety research, and applications such as driver rehabilitation and AV route planning. The study utilized the Waymo Open Dataset, comprising 798 perception data trips and 158,090 LiDAR point cloud frames collected in San Francisco, Phoenix, and Mountain View. The researchers processed raw LiDAR data to extract object types (vehicles, pedestrians, bicyclists, signs) and their spatial coordinates. Feature engineering involved calculating the Stopping Sight Distance (SSD) and Useful Field of View (UFOV) based on vehicle speed to construct a three-dimensional Cone of Vision (COV). This allowed for the identification of objects within the driver’s relevant visual field. Two analytical approaches were employed: a statistical model using inverse distance weighting to calculate the CCF, and an unsupervised machine learning approach using k-means and hierarchical clustering. Principal Component Analysis (PCA) and correlation analysis identified object velocity, object density, and object proximity as the critical variables for clustering. The results demonstrated that both statistical and machine learning models effectively predicted dynamic complexity. The statistical approach categorized frames into high, medium, and low complexity based on CCF quartiles. However, the unsupervised clustering provided more granular insights by identifying three distinct clusters corresponding to low, medium, and high complexity environments. Cluster zero represented low-complexity environments characterized by low velocity and low object density. Cluster one indicated medium complexity with higher velocities and moderate density/proximity. Cluster two defined high-complexity environments, such as central business districts, featuring high object density and proximity. The study established specific dynamic ranges for attributes, such as velocity (0–28 mph for low complexity) and object count (128–209 for high complexity), to categorize driving scenes. The significance of this work lies in its ability to identify and predict high-risk environments in real-time, offering substantial benefits for safety research, auto-insurance risk assessment, and autonomous vehicle development. Specifically, the methodology supports Driving Rehabilitation Specialists in scoring dynamic complexity during on-road evaluations for medically at-risk drivers, enhancing the consistency and validity of these assessments. By moving beyond static traffic measures to incorporate dynamic visual demand and cognitive load factors, this research provides a foundational tool for understanding how specific combinations of speed, density, and proximity increase crash likelihood, ultimately contributing to safer driving environments and more robust AV systems.
Key finding
Unsupervised clustering of LiDAR-derived variables including velocity, object density, and proximity successfully categorized driving environments into distinct complexity levels, offering a more granular assessment of dynamic risk than traditional statistical quartiles.
Methodology
dataset
Sample size: 798
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via bulk_ingest_rosap on 2026-05-23 (6 acquisition events logged).
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | rosap | — | — | 2 | 2026-05-23 |
| archive | success | — | — | — | 1 | 2026-05-23 |
| extract | success | cached | — | — | 2 | 2026-06-10 |
| clean | success | — | — | — | 1 | 2026-06-01 |
| chunk | success | — | — | — | 1 | 2026-06-01 |
| embed | success | — | — | — | 1 | 2026-06-02 |
| enrich | success | — | — | — | 1 | 2026-05-23 |
| promote | success | — | — | — | 1 | 2026-05-23 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 3 | 2026-06-10 |
| tag | success | vector_similarity | — | — | 19 | 2026-06-11 |
| verify | success | — | — | — | 2 | 2026-06-10 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Empirical Findings: crash risk outcomes
- Methodological Resource: dataset resource
- Theoretical Contribution: computational model