Perception of the visual environment

Tatler, Benjamin W. · 2016 · Crossref

DOI: 10.1075/aicr.93.02tat

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This review chapter by Benjamin W. Tatler examines the mechanisms governing how humans allocate visual attention and sample information from their environment. The research is motivated by the physiological constraints of vision: high-acuity foveal vision is spatially restricted and temporally intermittent, with eyes fixating on only three to four locations per second. Consequently, understanding how gaze is targeted and how sampled information is retained in memory is critical for explaining human behavior. The author evaluates current computational models and empirical findings across three distinct experimental paradigms: static photographs, dynamic movies, and natural real-world tasks. The analysis begins with static scene viewing, where fixation selection has traditionally been modeled using "conspicuity-based" approaches, such as the Itti and Koch visual salience model. These models posit that eyes are drawn to locations with high low-level feature contrast (luminance, color, orientation). However, Tatler argues that these models have limited explanatory power, often performing no better than a simple central Gaussian bias, and fail to account for task-dependent variations in viewing behavior. He highlights that higher-level factors, including object location expectations and target appearance templates, significantly improve prediction accuracy. Furthermore, the static paradigm suffers from artifacts such as sudden scene onsets and monitor framing effects, which induce central fixation biases unrelated to scene content. Regarding dynamic scenes, the review notes that while motion can serve as a salience cue, its predictive power is largely confined to movies with editorial cuts. In continuous, unedited video, dynamic features are poor predictors of fixation; instead, gaze is driven by screen center bias and perceived horizon lines. More advanced models incorporating higher-order derivatives and top-down task information show promise but remain less established than static scene models. The chapter concludes by emphasizing the limitations of screen-based paradigms in explaining natural behavior. In real-world settings, gaze is tightly coupled with motor actions and behavioral goals. Fixations are directed at task-relevant objects with precise temporal coordination, typically occurring 0.5 to 1 second before manipulation. The author asserts that findings from static and dynamic screen viewing do not fully generalize to natural environments, where the interaction between vision and action fundamentally shapes gaze allocation and visual memory retention.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

StageOutcomeToolModelPromptAttemptsCompleted
discover success Crossref 1 2026-06-18
archive success unpaywall 2 2026-06-25
extract success cached 2 2026-06-26
clean success clean 1 2026-06-19
chunk success chunk 1 2026-06-19
embed success embed Qwen/Qwen3-Embedding-8B 1 2026-06-19
promote success 1 2026-06-18
summarize success llm qwen3.6-27b-prismaquant summ-v5 1 2026-06-26
tag success vector_similarity 6 2026-06-19
verify success 1 2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.