Big Data Visualization and Spatiotemporal Modeling of Risky Driving
archive: archived pipeline: cataloged verified
Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)
Summary
This study addresses the challenge of identifying and visualizing risky driving behaviors using large-scale vehicle kinematic data. Motivated by statistical evidence linking aggressive driving to roadway collisions, the authors aimed to mine the Safety Pilot Model Deployment (SPMD) dataset to detect risky driving moments in space and time. The research focuses on processing high-frequency data collected from approximately 3,000 vehicles in Ann Arbor, Michigan, to support traffic safety evaluations and countermeasure prioritization. The methodology involved four main stages: data exploration, database development, risky driving classification, and visualization tool development. First, the authors compared the performance of relational (PostgreSQL with PostGIS) and non-relational (MongoDB) database management systems for handling the massive SPMD dataset, which contained over 1.5 billion GPS points. They conducted performance tests on nonspatial and spatial queries using subsets of the data. For classification, the study employed both unsupervised and supervised learning techniques. Unsupervised cluster analysis was used to label risky driving events during short monitoring periods based on kinematic variables such as speed, acceleration, and yaw rate. These labeled data were then used to train random forest models for supervised prediction of risky driving. Additionally, the authors developed open-source and enterprise visualization tools to map these events. The results indicated that PostgreSQL and PostGIS significantly outperformed MongoDB in query execution times for both nonspatial and spatial queries, particularly as data size increased. In terms of driving behavior detection, simple speeding analysis revealed distinct spatial patterns, with severe speeding (>10 mph over the limit) concentrated at highway intersections and ramps, while minor speeding occurred more frequently in downtown areas. The machine learning framework demonstrated high prediction performance, successfully identifying risky driving events using the random forest models trained on kinematic data. The developed visualization tools effectively illustrated the spatiotemporal distribution of these risky events. The significance of this work lies in its provision of a robust framework for processing and analyzing big data in transportation safety. By demonstrating the superiority of relational databases for spatial queries and validating machine learning approaches for behavior classification, the study offers practical tools for researchers and practitioners. These tools enable the identification of high-risk locations and times, facilitating targeted safety interventions and improving the understanding of how kinematic data can be leveraged to mitigate crash risks associated with aggressive driving.
Key finding
PostgreSQL and PostGIS databases executed spatial and non-spatial queries significantly faster than MongoDB when processing large-scale vehicle kinematic data.
Methodology
dataset
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via bulk_ingest_rosap on 2026-05-23 (6 acquisition events logged).
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | rosap | — | — | 2 | 2026-05-23 |
| archive | success | — | — | — | 1 | 2026-05-23 |
| extract | success | cached | — | — | 2 | 2026-06-10 |
| clean | success | — | — | — | 1 | 2026-06-01 |
| chunk | success | — | — | — | 1 | 2026-06-01 |
| embed | success | — | — | — | 1 | 2026-06-02 |
| enrich | success | — | — | — | 1 | 2026-05-23 |
| promote | success | — | — | — | 1 | 2026-05-23 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 3 | 2026-06-10 |
| tag | success | vector_similarity | — | — | 19 | 2026-06-11 |
| verify | success | — | — | — | 2 | 2026-06-10 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Empirical Findings: crash risk outcomes
- Methodological Resource: dataset resource
- Theoretical Contribution: computational model