Quantifying Drowsy Driving [Traffic Safety Facts Research Note]

NHTSA · 2020 · ROSA P / United States. National Highway Traffic Safety Administration. Office of Behavioral Safety Research

archive: archived pipeline: cataloged verified

Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)

Summary

This research note addresses the underestimation of drowsy driving incidents in official statistics, which rely on police crash reports that often miss non-crash events or subtle impairment. To quantify drowsy driving more accurately, the study explored the feasibility of using machine learning algorithms to identify drowsy driving episodes within the large-scale Strategic Highway Research Program 2 (SHRP2) Naturalistic Driving Study dataset. The SHRP2 NDS provides extensive time-series vehicle data and driver-facing video from over 3,400 drivers, offering a unique opportunity to detect drowsiness in both crash and incident-free contexts. The researchers developed two distinct algorithms to classify driving epochs as drowsy or non-drowsy, using the Observer Rating of Drowsiness (ORD) protocol as the ground truth. Three analysts manually coded 789 one-minute video epochs, resulting in 741 usable ratings. Algorithm 1 utilized time-history vehicle data, including yaw rate, lane position deviations, lane slope, trip start time, and trip duration. Researchers applied four tree-based classifiers to this data, testing performance against ORD thresholds of 50 (moderately drowsy) and 70 (very drowsy). Algorithm 2 analyzed face video data to extract facial landmarks and head orientation metrics, such as PERCLOS (percentage of eye closure) and head pitch/yaw/roll. Five machine learning models were trained on these features to predict drowsiness. The results indicated moderate success for both approaches but highlighted significant limitations. For Algorithm 1, the extra trees classifier achieved an Area Under the Curve (AUC) of 0.72 for the ORD threshold of 50, while the gradient boosting classifier achieved an AUC of 0.76 for the threshold of 70. Feature importance analysis revealed that trip start time and duration were most predictive for moderate drowsiness, whereas lane position slope was most critical for severe drowsiness. For Algorithm 2, the extreme gradient boosting model performed best with an AUC of 0.76, relying heavily on eye gap, mouth gap, and head orientation features. However, both algorithms produced notable misclassifications. Analysis of errors showed that vehicle data algorithms often confused traffic-responsive maneuvers (e.g., hard braking in heavy traffic) with drowsiness, while video algorithms struggled with unusual facial features or drivers engaged in other activities like talking. The study concludes that while naturalistic driving data holds promise for screening large datasets for potential drowsy driving episodes, current algorithms are not robust enough for consistent identification or real-time warning systems. Vehicle sensor data can effectively screen for severe drowsiness via lane drift, while video data offers complementary insights but suffers from high false alarm rates. Significant improvements in algorithmic accuracy are required before these methods can reliably quantify drowsy driving in naturalistic settings.

Key finding

Machine learning algorithms using vehicle sensor data and facial video analysis achieved moderate classification accuracy (AUC 0.72-0.76) for identifying drowsy driving episodes but were not robust enough for consistent detection due to significant misclassifications in heavy traffic and unusual driver behaviors.

Methodology

naturalistic

Sample size: 3400

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via bulk_ingest_rosap on 2026-05-23 (6 acquisition events logged).

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	rosap	—	—	2	2026-05-23
archive	success	—	—	—	1	2026-05-23
extract	success	cached	—	—	2	2026-06-10
clean	success	—	—	—	1	2026-06-01
chunk	success	—	—	—	1	2026-06-01
embed	success	—	—	—	1	2026-06-02
enrich	success	—	—	—	1	2026-05-23
promote	success	—	—	—	1	2026-05-23
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	3	2026-06-10
tag	success	vector_similarity	—	—	19	2026-06-11
verify	partial	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified_with_issues.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Empirical Findings: physiological data
Methodological Resource: tool software
Theoretical Contribution: computational model