Drowsiness Detection in Drivers: A Systematic Review of Deep Learning-Based Models

Fonseca, Tiago; Ferreira, Sara · 2025 · MDPI Applied Sciences

DOI: 10.3390/app15169018

URL: https://www.mdpi.com/2076-3417/15/16/9018

archive: archived pipeline: cataloged

Summary

Systematic review (PRISMA 2020) of deep learning-based driver drowsiness detection models published 2015-2025. Searched PubMed, Scopus, Web of Science, ScienceDirect, and IEEE Xplore (March 2025); 1606 records identified, 81 studies met inclusion criteria. Eligible studies developed/validated DL models (CNNs, RNNs, LSTMs, Transformer-based) using behavioral, physiological, vehicle-based, or multimodal inputs collected under real or simulated driving conditions. Synthesis is narrative due to methodological heterogeneity. Reports median performance across architectures and contexts, plus dataset characteristics, deployment constraints, and ethical/privacy considerations. Conducted by a single reviewer with ChatGPT-assisted extraction manually cross-verified.

Key finding

Across 81 included studies, median performance is high under both simulated and real-world conditions: accuracy 0.952 (n=77 studies), precision 0.956 (n=42), recall 0.953 (n=51), F1-score 0.953 (n=52), AUC-ROC 0.975. Real-world-tested models report higher medians (accuracy 0.977, F1 0.972) than simulation-only models (0.958, 0.948). Despite strong reported metrics, real-world deployment is limited by inconsistent metric reporting, limited dataset transparency, lack of demographic diversity, weak validation protocols, and unresolved privacy concerns.

Methodology

Systematic review following PRISMA 2020 (PROSPERO CRD420251078841). Boolean search 'driver AND (drowsiness OR sleepiness) AND detection AND deep learning' across five databases. Single-reviewer screening in Rayyan; ChatGPT-assisted structured extraction in Excel with manual cross-verification. Inclusion: peer-reviewed empirical English-language journal articles (2015-2025) developing DL models for driver drowsiness detection with reported performance metrics (accuracy, precision, recall, F1, AUC-ROC) on real or simulated driving data. Narrative synthesis with descriptive medians/IQRs; no meta-analysis due to heterogeneity.

Sample size: 81 studies (qualitative synthesis); no human-subjects sample

Quality score: 5 / 5

Topics