Towards Driver Behavior Understanding: Weakly-Supervised Risk Perception in Driving Scenes

Agarwal, Nakul; Chen, Yi-Ting; Dariush, Behzad · 2026 · arXiv

URL: http://arxiv.org/abs/2603.05926v1

archive: archived pipeline: cataloged verified

Abstract

Achieving zero-collision mobility remains a key objective for intelligent vehicle systems, which requires understanding driver risk perception-a complex cognitive process shaped by voluntary response of the driver to external stimuli and the attentiveness of surrounding road users towards the ego-vehicle. To support progress in this area, we introduce RAID (Risk Assessment In Driving scenes)-a large-scale dataset specifically curated for research on driver risk perception and contextual risk assessment. RAID comprises 4,691 annotated video clips, covering diverse traffic scenarios with labels for driver's intended maneuver, road topology, risk situations (e.g., crossing pedestrians), driver responses, and pedestrian attentiveness. Leveraging RAID, we propose a weakly supervised risk object identification framework that models the relationship between driver's intended maneuver and responses to identify potential risk sources. Additionally, we analyze the role of pedestrian attention in estimating risk and demonstrate the value of the proposed dataset. Experimental evaluations demonstrate that our method achieves 20.6% and 23.1% performance gains over prior state-of-the-art approaches on the RAID and HDDS datasets, respectively.

Summary

Agarwal, Chen, and Dariush introduce RAID, a 4,691-clip naturalistic driving dataset annotated with intended maneuver, road topology, risk situations, driver responses, pedestrian attention, face boxes, and pedestrian tracklets, designed for driver-centric risk perception research. They propose a weakly-supervised risk-object-identification framework that links driver intended maneuver to driver response to localize risk sources, and analyze how pedestrian attentiveness contributes to risk estimation. The method achieves 20.6% performance gains on RAID and 23.1% on the HDDS benchmark over prior state-of-the-art, providing a dataset and baseline for behavioral, attention-aware risk modeling in automated and assistance systems.

Key finding

A weakly-supervised model linking driver intended maneuver, driver response, and pedestrian attentiveness improved risk-object identification by 20.6% on RAID and 23.1% on HDDS over prior state-of-the-art.

Methodology

Computer-vision dataset and modeling paper. Curated RAID, a 4,691-clip naturalistic San Francisco driving video dataset with annotations for intended maneuver, road topology, risk situations, driver action/response, pedestrian attention, face boxes, and pedestrian tracklets. Trained a weakly-supervised risk-object-identification model that conditions on driver maneuver and response and incorporates pedestrian attentiveness; benchmarked on RAID and HDDS against prior approaches.

Sample size: 4,691 annotated video clips (no human-participant sample)

Quality score: 5 / 5

Topics