Drowsiness Detection in Drivers: A Systematic Review of Deep Learning-Based Models

Fonseca, Tiago; Ferreira, Sara · 2025 · MDPI Applied Sciences

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This systematic review evaluates the performance, application contexts, and implementation challenges of deep learning (DL) models for detecting driver drowsiness, a critical factor in road traffic crashes. Motivated by the limitations of traditional monitoring systems and the fragmented state of DL literature, the study aims to synthesize empirical evidence on DL-based detection systems. The review addresses four primary research questions regarding model architectures, precision, dataset characteristics, and development challenges. Conducted in accordance with PRISMA 2020 guidelines, the review analyzed peer-reviewed empirical studies published between 2015 and 2025. Researchers searched five major databases (PubMed, Scopus, Web of Science, ScienceDirect, and IEEE Xplore) and applied strict inclusion criteria, requiring studies to focus on drivers, utilize DL models (such as CNNs, RNNs, or Transformers), and report performance metrics using real or simulated driving data. After screening 1,606 records, 81 studies met the inclusion criteria. Due to methodological heterogeneity, findings were synthesized narratively rather than through meta-analysis. The results indicate that Convolutional Neural Networks (CNNs) were the most prevalent architecture (35 studies), primarily used for behavioral data like facial expressions and eye movements, followed by Recurrent Neural Networks (16 studies) for physiological time-series data. Hybrid and Transformer-based models were also employed. The majority of studies (46) focused on offline analysis, while 35 targeted real-time detection for Advanced Driver Assistance Systems. Performance metrics were consistently high, with median accuracy, precision, recall, and F1-scores exceeding 0.95, and median AUC-ROC reaching 0.975. However, the review identified significant limitations, including a lack of demographic diversity in datasets, inconsistent evaluation protocols, and limited transparency regarding dataset sources. Most studies relied on simulated environments or proprietary datasets, raising concerns about generalizability. The authors conclude that while DL models demonstrate strong predictive capabilities, their real-world deployment is hindered by practical and methodological constraints. Key barriers include insufficient attention to ethical and privacy considerations, limited dataset transparency, and a lack of robust validation across diverse driving conditions. The review highlights a need for future research to prioritize the development of inclusive, multimodal datasets, conduct multi-context evaluations, and establish rigorous standards for ethical compliance and real-world feasibility. Addressing these gaps is essential for translating high-performing laboratory models into scalable, reliable safety systems.

Key finding

Across 81 included studies, median performance is high under both simulated and real-world conditions: accuracy 0.952 (n=77 studies), precision 0.956 (n=42), recall 0.953 (n=51), F1-score 0.953 (n=52), AUC-ROC 0.975. Real-world-tested models report higher medians (accuracy 0.977, F1 0.972) than simulation-only models (0.958, 0.948). Despite strong reported metrics, real-world deployment is limited by inconsistent metric reporting, limited dataset transparency, lack of demographic diversity, weak validation protocols, and unresolved privacy concerns.

Methodology

review

Sample size: 81 studies (qualitative synthesis); no human-subjects sample

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via discover_direct_oa on 2026-05-03 (5 acquisition events logged).

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	—	—	—	1	2026-05-03
archive	success	—	—	—	1	2026-05-03
extract	success	cached	—	—	2	2026-06-10
clean	success	—	—	—	1	2026-06-01
chunk	success	—	—	—	1	2026-06-01
embed	success	—	—	—	1	2026-06-02
enrich	success	crossref	—	—	1	2026-06-04
promote	success	—	—	—	1	2026-05-03
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	2	2026-06-10
tag	success	vector_similarity	—	—	18	2026-06-11
verify	partial	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified_with_issues.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Empirical Findings: physiological data
Methodological Resource: tool software, validation psychometrics