Gated Recurrent Fusion to Learn Driving Behavior from Temporal Multimodal Data
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the challenge of modeling tactical driver behavior in autonomous navigation by effectively fusing temporal multimodal data, such as video, LiDAR, and CAN bus signals. Existing deep learning approaches typically handle sensor fusion or driver policy learning separately, or employ simple pre- or post-concatenation strategies that fail to explicitly fuse multi-sensor data and often result in inefficient parameter spaces. The authors argue that autonomous driving datasets present unique challenges, including disproportionate sensor data sizes, intermittent quality degradation, and complex inter-dependencies, necessitating a more robust temporal fusion mechanism. To solve this, the authors propose the Gated Recurrent Fusion Unit (GRFU), a novel recurrent neural network architecture inspired by Long Short-Term Memory (LSTM) gating mechanisms. The GRFU simultaneously learns fusion weighting and temporal weighting. The methodology involves three specific architectural variations: Late Recurrent Summation (LRS), which uses parallel LSTM cells for each sensor; Early Gated Recurrent Fusion (EGRF), which learns linear interpolation weights for sensor encodings; and the proposed Late Gated Recurrent Fusion (LGRF), which combines independent memory control for each sensor with learned fusion gates. These models are designed to learn optimal linear interpolation between sensors, allowing the network to dynamically adjust the contribution of each sensor based on data quality and relevance. The proposed models were evaluated on two datasets: the Honda Driving Dataset (HDD) for driver behavior classification and the TORCS simulator for steering angle regression. On the HDD dataset, which involves classifying twelve driver actions from video and CAN signals, the LGRF model achieved a 10% improvement in mean Average Precision (mAP) over state-of-the-art baselines. For steering angle regression on TORCS, the model demonstrated a 20% reduction in Mean Squared Error (MSE). The study also included ablation studies comparing the proposed methods against early fusion (concatenation/summation) and late fusion baselines, confirming that the gating mechanisms provide superior performance by allowing the network to modulate fusion processes at each time step. The significance of this work lies in its introduction of an interpretable fusion model for autonomous navigation. By using global average pooling on the fusion gates, the model provides insights into each sensor’s contribution, enabling verification of model decisions and higher-level intervention during sensor failures. The results demonstrate that explicit, learned temporal fusion outperforms standard concatenation methods, particularly in scenarios involving occlusion or noisy sensor data. This approach offers a more efficient and robust framework for understanding driving behavior from rich multimodal signals, addressing a critical gap in current autonomous driving research.
Key finding
The proposed Gated Recurrent Fusion Units achieve superior performance in driver behavior classification and steering angle regression by simultaneously learning optimal fusion and temporal weights from multimodal sensor data.
Methodology
simulation_modeling
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via author_sweep_intake on 2026-05-27.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | author_sweep | — | — | 2 | 2026-05-27 |
| archive | success | unpaywall | — | — | 2 | 2026-06-04 |
| extract | success | cached | — | — | 3 | 2026-06-10 |
| clean | success | clean | — | — | 1 | 2026-06-04 |
| chunk | success | chunk | — | — | 1 | 2026-06-04 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-04 |
| enrich | success | semantic_scholar | — | — | 2 | 2026-06-04 |
| promote | success | — | — | — | 1 | 2026-06-04 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 2 | 2026-06-10 |
| tag | success | vector_similarity | — | — | 15 | 2026-06-11 |
| verify | success | — | — | — | 2 | 2026-06-10 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Theoretical Contribution: computational model