Using Naturalistic Driving Performance Data to Develop an Empirically Defined Model of Distracted Driving

Bingham, C. Raymond; Bao, Shan; Flannagan, Carol; Pradhan, Anuj K. · 2016 · ROSA P / Nextrans

archive: archived pipeline: cataloged verified

Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)

Summary

This study addresses the challenge of empirically defining distracted driving by developing a stochastic model capable of identifying driver distraction using only vehicle kinematic data. The research is motivated by the discrepancy between simulator studies, which consistently show performance deficits during secondary tasks, and crash data, which often under-reports distraction or shows decreasing crash rates. To overcome the limitations of police reports and self-reported data, the authors aimed to create an algorithm that detects cellular phone use—a specific form of distraction—by analyzing naturalistic driving performance metrics without relying on video or audio inputs. The researchers utilized data from the Integrated Vehicle-Based Safety Systems Field Operational Test (IVBSS FOT), involving 108 drivers who operated instrumented vehicles for six weeks. The analysis focused on baseline driving data collected during the first two weeks, where warning systems were disabled. The study specifically targeted cellular phone use, including hand-held and hands-free interactions. To build the dataset, the authors identified 349 five-second "case" clips containing phone use from 35 drivers. They employed a rigorous matching strategy to select control clips from the same drivers, matching for time of day, roadway type, and traffic density, resulting in a final dataset of 10,054 matched clips. Driving performance measures included acceleration pedal use, driving distance, speed, and lane offset. These variables were transformed from the time domain to the frequency domain using Fast Fourier Transform (FFT) to analyze spectral power variations. The core methodology applied Hidden Markov Modeling (HMM) to classify driving states. The HMM was configured with five hidden states and used a sliding window approach to process continuous data streams, allowing for real-time prediction. The study evaluated three modeling protocols: individual models (trained and tested on single subjects), generic models (trained on the entire dataset), and leave-one-out models (trained on all subjects except one, tested on the excluded subject). Performance was assessed using accuracy, equal error rate, precision-recall curves, and Area Under the Curve (AUC) metrics. The results demonstrated that individual-based models significantly outperformed generic models, achieving an accuracy of 0.88 and an error rate of 0.27, compared to 0.59 accuracy and 0.38 error rate for generic models. Frequency domain analysis revealed that texting tasks caused a higher variation in lane offset power, particularly in the low-frequency band (0–0.5 Hz), indicating erratic lane control immediately upon engaging in the task. The ROC curves confirmed that the classifier could distinguish texting behavior from baseline driving better than random guessing. The study concludes that stochastic modeling algorithms like HMM are effective for detecting driver states using kinematic features alone, though individualized models are necessary for high accuracy. This approach offers a viable method for monitoring driver distraction in naturalistic settings without requiring direct observation of the driver.

Key finding

Individual-based Hidden Markov Model classifiers achieved 0.88 accuracy in identifying distracted driving episodes, significantly outperforming generic models which had 0.59 accuracy.

Methodology

naturalistic

Sample size: 35

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via bulk_ingest_rosap on 2026-05-23 (6 acquisition events logged).

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	rosap	—	—	2	2026-05-23
archive	success	—	—	—	1	2026-05-23
extract	success	cached	—	—	2	2026-06-10
clean	success	—	—	—	1	2026-06-01
chunk	success	—	—	—	1	2026-06-01
embed	success	—	—	—	1	2026-06-02
enrich	success	—	—	—	1	2026-05-23
promote	success	—	—	—	1	2026-05-23
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	3	2026-06-10
tag	success	vector_similarity	—	—	19	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Empirical Findings: observational prevalence
Methodological Resource: dataset resource
Theoretical Contribution: computational model