DriveSAM: Cognitive Perspective on Driving Maneuvers Based on Drivers’ Attention Using Eye Gaze Data

Kwakye, Kelvin; Seong, Younho; Yi, Sun; Aboah, Armstrong · 2023 · Crossref

DOI: 10.46254/ev01.20230071

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This study addresses the critical need to understand driver behavior and cognitive decision-making by analyzing how drivers allocate visual attention during various driving maneuvers. The authors aim to answer fundamental questions regarding what captures drivers' primary visual focus and why drivers occasionally deviate into obstacles or other vehicles. To bridge gaps in previous research, the study employs a zero-shot learning technique to uncover complex patterns between driver attention and cognitive processes, leveraging the synergistic fusion of image features from driving scenes and eye gaze data. The methodology utilizes the DR(eye)VE dataset, which comprises 555,000 frames across 74 sequences of five-minute durations. The data involves eight drivers (seven male, one female, aged 20–40) navigating diverse environments, including downtown areas, countryside settings, and highways, under varying weather and lighting conditions. The core analytical tool is the Segment Anything Model (SAM), a deep learning architecture designed for zero-shot learning. SAM consists of an image encoder using a pre-trained Vision Transformer, a flexible prompt encoder that processes sparse prompts (points, boxes, text) and dense prompts (masks), and a mask decoder that maps these embeddings to segment attention regions. This approach allows the model to detect and segment objects without predetermined labels, generating attention heatmaps and density maps based on driver gaze information. The results indicate a distinct variance in attention allocation depending on the driving context. Drivers predominantly focused their attention on the road ahead during highway driving, a behavior attributed to higher speeds, the necessity of maintaining safe following distances, and fewer distractions. In contrast, city driving environments, characterized by pedestrians, intersections, and frequent stops, presented more complex attention patterns. The study also found that instances where drivers veered into other lanes or toward obstacles often coincided with their gaze being diverted away from the primary roadway, linking gaze diversion directly to hazardous maneuvers. The significance of this research lies in its demonstration that zero-shot learning can effectively segment driver attention without extensive labeled data, offering a scalable method for behavioral research. The findings highlight the importance of context-specific interventions for road safety. For instance, adaptive driver assistance systems could be tailored to provide more frequent alerts in city environments where distraction risks are higher. By elucidating the relationship between visual attention and decision-making, this work contributes to the development of optimized driver monitoring systems and advanced human-vehicle interaction strategies.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	Crossref	—	—	1	2026-06-25
archive	success	canonical_url	—	—	1	2026-06-26
extract	success	cached	—	—	2	2026-06-26
clean	success	clean	—	—	1	2026-06-26
chunk	success	chunk	—	—	1	2026-06-26
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-26
enrich	success	openalex	—	—	1	2026-06-26
promote	success	—	—	—	1	2026-06-25
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-26
tag	success	vector_similarity	—	—	6	2026-06-26
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Empirical Findings: behavioral performance data
Methodological Resource: tool software
Theoretical Contribution: theory or model