N-back Temporal Stability: The Auditory N-back Task as an Unstable Measurement Standard
archive: archived pipeline: cataloged verified
Abstract
N-back Temporal Stability The Auditory N-back Task: An Unstable Standard? Camille L. Wheatley1,2, Joel M. Cooper3, Kaedyn W. Crabtree1, Ashleigh V. T. Wise4, Conner J. Motzkus5, Spencer C. Castro1,6, & David L. Strayer1† 1 University of Utah, 2 People Analytics, 3 Red Scientific Inc., 4 University of Kansas, 5 University of Windsor, 6 University of California, Merced † Professor Emeritus, University of Utah Corresponding author: Camille L. Wheatley, camille.l.wheatley@gmail.com Highlights •
Summary
Two-experiment study examining the temporal stability of the auditory N-back task (ISO 14198) as a cognitive workload calibration tool. Experiment 1 found systematic performance improvement (accuracy ~54% to ~93%) and decreasing cognitive demand (DRT RT, NASA-TLX) across 26 on-road driving sessions for 10 participants. Experiment 2 tested whether improvements were due to sequence memorization vs strategy acquisition using novel digit sequences with 20 prior-exposed participants. Old and new sequences produced equivalent performance, ruling out memorization.
Key finding
The auditory N-back task is NOT stable over repeated use: accuracy increases toward ceiling and cognitive workload decreases significantly across 26 sessions. Improvement transfers to novel digit sequences (general strategy acquisition, not sequence-specific learning). Likely driven by subvocal rehearsal strategy adoption and/or automatization of component processes. Has implications for multi-session studies using N-back as a cognitive reference task — workload estimates will be systematically biased downward in later sessions.
Methodology
Experiment 1: within-subjects repeated-measures design with 10 participants across 6 evenly-spaced sessions (of 26 total). On-road driving paradigm measuring N-back accuracy, DRT RT, DRT hit rate, NASA-TLX. Experiment 2: 20 participants with 10+ prior exposures tested on Old vs New digit sequences using repeated-measures ANOVA.
Sample size: Exp 1: N=10 (5F), Mage=25.4; Exp 2: N=20 (10F), Mage=26.5. Both from larger IVIS evaluation (Strayer et al., 2017).
Quality score: 7 / 5