The Quality of Response Time Data Inference: A Blinded, Collaborative Assessment of the Validity of Cognitive Models

Dutilh, Gilles; Annis, Jeffrey; Brown, Scott; Cassey, Peter; Evans, Nathan J.; Grasman, Raoul P. P. P.; Hawkins, Guy E.; Heathcote, Andrew; Holmes, William R.; Krypotos, Angelos‐Miltiadis; Kupitz, Colin; Leite, Fábio P.; Lerche, Veronika; Lin, Yi-Shin; Logan, Gordon D.; Palmeri, Thomas J.; Starns, Jeffrey J.; Trueblood, Jennifer S.; Maanen, Leendert van; Ravenzwaaij, Don van; Vandekerckhove, Joachim; Visser, Ingmar; Voß, Andreas; White, Corey N.; Wiecki, Thomas V.; Rieskamp, Jörg; Donkin, Chris · 2018 · Psychonomic Bulletin & Review

DOI: 10.3758/s13423-017-1417-2

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the validity of inferences drawn from cognitive models of response time (RT) data, specifically evidence-accumulation models like the diffusion model and Linear Ballistic Accumulation (LBA) model. These models translate observed RTs and accuracy into latent psychological constructs: ease of processing, response caution, response bias, and non-decision time. The authors highlight a critical threat to validity: "researcher degrees of freedom." Analysts face numerous arbitrary choices regarding model selection, estimation methods, and inference procedures, which may bias conclusions. While previous validation studies provided mixed support for the convergent and discriminant validity of these models, they often lacked blinding and tested only single methods. This study aims to assess how robust model-based inferences are against these analytical choices in a realistic, collaborative setting. To test this, the authors conducted a blinded, collaborative assessment involving 17 teams of experts analyzing 14 identical two-condition data sets. The data were generated from a random dot motion task performed by 20 participants. The experimental design manipulated three factors: stimulus difficulty (easy vs. hard), response caution (speed vs. accuracy emphasis instructions), and response bias (balanced vs. skewed stimulus probabilities). Crucially, the 17 contributing teams were blind to the specific manipulations in each data set. They were tasked with inferring which psychological construct (ease, caution, bias, or non-decision time) differed between the two conditions using their preferred models and analytical methods. This design allowed the authors to evaluate the validity of inferences across a wide range of currently popular analytical approaches, guarding against the bias that occurs when analysts know the expected results. The results demonstrated that while conclusions were generally similar across different methods, the "modeler’s degrees of freedom" did affect the specific inferences drawn. Notably, the study found that simpler analytical approaches and models yielded inferences that were as robust and accurate as those from more complex methods. The blinded nature of the study ensured that the validity assessment was not confounded by analysts tailoring their choices to match known outcomes. The findings suggest that the choice of model or estimation technique has less impact on the validity of high-level inferences than previously feared, provided the models are applied correctly. The significance of this work lies in its recommendation for standardizing RT data analysis. The authors argue that cognitive models should become a typical analysis tool for response time data, particularly in standard experimental designs. They conclude that simpler models and procedures are often sufficient and may be preferable due to their robustness. The paper also outlines situations where more complicated models are necessary and discusses potential pitfalls in interpreting model outputs. By demonstrating that valid inferences can be drawn despite analytical variability, the study supports the utility of cognitive modeling while advocating for transparency and the use of simpler, well-validated methods to minimize researcher bias.

Key finding

Simpler cognitive modeling approaches yielded inferences that were as robust and accurate as more complex methods, though researcher degrees of freedom did affect the specific conclusions drawn.

Methodology

dataset

Sample size: 20

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via author_sweep_intake on 2026-05-28.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	author_sweep	—	—	2	2026-05-28
archive	success	canonical_url	—	—	1	2026-06-04
extract	success	cached	—	—	3	2026-06-10
clean	success	clean	—	—	1	2026-06-04
chunk	success	chunk	—	—	1	2026-06-04
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-04
enrich	success	—	—	—	1	2026-05-28
promote	success	—	—	—	1	2026-06-04
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	2	2026-06-10
tag	success	vector_similarity	—	—	15	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

workload measurement

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Empirical Findings: behavioral performance data
Theoretical Contribution: computational model