Cognitive Workload Estimation in Conditionally Automated Vehicles Using Transformer Networks Based on Physiological Data

Wang, Ange; Wang, Jiyao; Shi, Wenxin; He, Dengbo · 2024 · Transportation Research Record

DOI: 10.1177/03611981241250023

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This study addresses the critical need for accurate cognitive workload estimation in conditionally automated vehicles (SAE Level 3). While driving automation reduces physical workload, drivers remain legally responsible for safety and must be ready to take control during emergencies. High cognitive load can impair takeover performance, yet existing estimation algorithms are largely designed for non-automated vehicles (SAE Level 0) and often rely on driving performance metrics unavailable when drivers are not actively controlling the vehicle. Furthermore, previous methods frequently ignored the temporal dependencies inherent in physiological data. To address these gaps, the authors propose a deep-learning algorithm that estimates driver cognitive load using only physiological signals, specifically leveraging a Transformer-encoder-based Network (TEN) to capture complex temporal correlations. The researchers utilized an open dataset from Meteier et al., comprising data from 90 drivers in a driving simulator. Participants performed either a baseline driving task or a high-load cognitive task (oral digit span counting). Physiological signals—electrocardiogram (ECG), electrodermal activity (EDA), and respiration (RESP)—were recorded at 1,000 Hz. The data underwent preprocessing, including noise elimination, downsampling to 50 Hz, and feature extraction using 60-second windows with a 2-second step size. From 56 initial features, seven key variables were selected using LASSO regression to reduce redundancy. The TEN model, which employs multi-head attention mechanisms to process temporal sequences, was trained and evaluated against five baseline models: k-Nearest Neighbors (KNN), Support Vector Machine (SVM), Light Gradient Boosting Machine (LightGBM), Convolutional Neural Networks (CNN), and Long Short-Term Memory (LSTM). Performance was assessed using two data partition strategies: within-subject (splitting data from the same individual) and across-subjects (training on some individuals and testing on others). The results demonstrated that the TEN model significantly outperformed all baseline models. Using within-subject data partition, the TEN achieved an accuracy of 94.4%, surpassing the next best model, LSTM, which reached 89%. In the more rigorous across-subjects partition, the TEN maintained an accuracy of 89%, compared to 83% for LSTM and 79% for LightGBM. The TEN also exhibited superior precision, indicating that its predictions of high cognitive load were highly reliable. The study attributes this performance gain to the Transformer’s ability to capture long-term temporal dependencies in physiological signals that traditional machine learning models and Recurrent Neural Networks (which follow the Markov property) fail to fully exploit. The significance of this work lies in providing a robust, physiology-only method for monitoring driver state in SAE Level 3 vehicles, where traditional driving metrics are invalid. By demonstrating that Transformer networks can effectively model the temporal dynamics of physiological data, the study offers a more accurate tool for detecting cognitive overload. This capability is essential for developing safety systems that can alert drivers or initiate countermeasures when cognitive load threatens safe vehicle takeover. The high precision of the model suggests it is particularly suitable for applications where false positives must be minimized to avoid unnecessary interruptions, while still reliably identifying dangerous states of high cognitive load.

Key finding

A Transformer-encoder-based Network utilizing physiological signals achieved superior accuracy in estimating driver cognitive load in SAE Level 3 vehicles compared to baseline machine learning and deep learning models.

Methodology

dataset

Sample size: 90

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	—	—	—	1	2026-05-28
archive	success	canonical_url	—	—	1	2026-06-06
extract	success	cached	—	—	3	2026-06-10
clean	success	clean	—	—	1	2026-06-04
chunk	success	chunk	—	—	1	2026-06-04
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-04
enrich	skipped	—	—	—	3	2026-06-04
promote	success	—	—	—	1	2026-06-04
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	2	2026-06-10
tag	success	vector_similarity	—	—	15	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Empirical Findings: physiological data
Theoretical Contribution: computational model, theory or model