Multimodal Detection of Driver Distraction
archive: archived pipeline: cataloged verified
Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)
Summary
This research addresses the critical safety issue of distracted driving, which remains a leading cause of crashes despite existing legislation. The study aims to develop an automated system capable of detecting driver distraction and issuing warnings, potentially preventing accidents by intervening before dangerous situations occur. Building upon previous work that relied solely on speech analysis, this project expands the detection framework by incorporating multiple data modalities to improve accuracy and predictive capability. To achieve this, the researchers collected a new multimodal dataset from 30 subjects using the OpenDS driving simulator. The data collection involved recording video from a back-facing camera, capturing car telemetry (such as gas pedal usage), and recording speech. A key methodological improvement was the annotation process: instead of relying on third-party observers, subjects reviewed their own driving sessions and marked moments they felt less attentive or noticed distractions, using a modified NASA Task Load Index. The simulated routes were modified to include hairpin turns, stop lights, street signs, and minor visual distractions like small packages on the roadway. The study utilized Multimodal Polynomial Fusion (MFP) to integrate three specific modalities: facial information (landmarks, head pose, glances, eye gaze, and facial action units), speech, and car information. The results demonstrate that the MFP approach outperforms baseline models, including the best-performing baseline neural network (NN-Cube). On the test set, the MFP model achieved an Area Under the Curve (AUC) of 0.7152, an Equal Error Rate (EER) of 0.3416, and an F-1 score of 0.5641, compared to the baseline’s 0.7048, 0.3488, and 0.5453, respectively. The study found that using multiple modalities provided the optimal tradeoff between false positives and false negatives, yielding the best overall detection performance. Crucially, the research evaluated the predictive timing of the detection. While detecting distraction during the dangerous event is insufficient for prevention, the MFP model reliably predicted dangerous distraction situations up to 6 seconds in advance and fairly reliably up to 8 seconds prior. This meets the practical requirement for warning drivers before consequences arise. The significance of this work lies in its demonstration that multimodal machine learning can provide early, reliable warnings for driver distraction. The findings suggest that with additional training data to account for individual differences, prediction accuracy at the 8-second threshold could further improve. The project produced a publicly available database of 30 drivers with multimodal information and open-source detection algorithms, contributing valuable resources to the field of safe transportation technologies.
Key finding
Multimodal polynomial fusion of face, speech, and car signals detected driver distraction better than the best baseline, reaching an AUC of 0.7152 and reliably predicting dangerous distraction 6 seconds in advance.
Methodology
simulator
Sample size: 30
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via bulk_ingest_rosap on 2026-05-23 (7 acquisition events logged).
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | rosap | — | — | 2 | 2026-05-23 |
| archive | success | — | — | — | 1 | 2026-05-23 |
| extract | success | cached | — | — | 2 | 2026-06-10 |
| clean | success | — | — | — | 1 | 2026-06-01 |
| chunk | success | — | — | — | 1 | 2026-06-01 |
| embed | success | — | — | — | 1 | 2026-06-02 |
| enrich | success | — | — | — | 1 | 2026-05-23 |
| promote | success | — | — | — | 1 | 2026-05-23 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 3 | 2026-06-10 |
| tag | success | vector_similarity | — | — | 19 | 2026-06-11 |
| verify | partial | — | — | — | 3 | 2026-06-10 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified_with_issues.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
- distraction detection algorithms
- visual
- external distraction
- drowsiness detection algorithms
- auditory
- visual manual
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Empirical Findings: behavioral performance data
- Methodological Resource: tool software
- Theoretical Contribution: conceptual framework