Behavior-based Predictive Safety Analytics - Pilot Study [supporting datasets]

Engström, Johan; Miller, Andrew; Huang, Wenyan; Soccolich, Susan A.; Machiani, Sahar Ghanipoor; Jahangiri, Arash; Dreger, Felix; de Winter, Joost · 2018 · ROSA P / Safety through Disruption (Safe-D) University Transportation Center (UTC)

archive: archived pipeline: cataloged verified

Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)

Summary

This document serves as a metadata record and data description for the dataset supporting the report "Behavior-based Predictive Safety Analytics – Pilot Study," funded by the U.S. Department of Transportation and preserved by the Virginia Tech Transportation Institute. The primary research objective was to investigate and develop statistical models capable of predicting individual driver crash involvement. The study aimed to determine if driving style, demographic information, and behavioral history could serve as reliable predictors for safety outcomes. The methodology relied on a subset of data from the Strategic Highway Research Program 2 (SHRP2) Naturalistic Driving Study. The researchers applied specific inclusion criteria to construct the analytical dataset, requiring participants to have engaged in SHRP2 data collection for at least seven months and to have driven more than 1,000 miles during the designated study period. For each included driver, a six-month interval (months 2–7 of data collection) was extracted to calculate driving style measures and assess crash or near-crash involvement. Additionally, questionnaire data collected prior to the start of SHRP2 data collection were retrieved to capture driver behaviors and risk perception. This rigorous selection process resulted in a final dataset comprising 2,800 drivers, representing 3.91 million trips, 27.16 million miles of driving distance, and 0.69 million driving hours. The resulting dataset is structured at the driver level with continuous variables. It integrates multiple data types, including questionnaire factors regarding driver behaviors and risk perception, exposure metrics based on time, hours, and trips, crash-related data, and driver behavior variables mined from the six-month study period. The data package includes an Excel file containing the processed data and a PDF data dictionary specifying variable definitions, with "NA" used to denote missing values. The dataset is categorized under Engineering and includes keywords such as Crash, Near Crash, Driver Behavior Questionnaire, Crash Rate, and Driver Behaviors. The significance of this work lies in its contribution to predictive safety analytics by providing a comprehensive, naturalistic driving dataset that links behavioral history with safety outcomes. By making this data publicly available through the Virginia Tech Transportation Institute repository and the National Transportation Library, the study supports further research into how individual driving styles and demographic factors influence crash risk. The dataset enables researchers to validate and expand upon statistical models for predicting crash involvement, thereby advancing the field of transportation safety through data-driven insights.

Key finding

The compiled driver-level dataset covers 2,800 drivers, 3.91 million trips, 27.16 million miles, and 0.69 million driving hours over a six-month window.

Methodology

dataset

Sample size: 2800

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via bulk_ingest_rosap on 2026-05-23 (7 acquisition events logged).

StageOutcomeToolModelPromptAttemptsCompleted
discover success rosap 2 2026-05-23
archive success 1 2026-05-23
extract success cached 3 2026-06-10
clean success 1 2026-06-01
chunk success 1 2026-06-01
embed success 1 2026-06-02
enrich success 1 2026-05-23
promote success 1 2026-05-23
summarize success llm qwen3.6-27b-prismaquant summ-v5 4 2026-06-10
tag success vector_similarity 19 2026-06-11
verify partial 3 2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified_with_issues.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).