Learning to Find Missing Video Frames with Synthetic Data Augmentation: A General Framework and Application in Generating Thermal Images Using RGB Cameras

Andersen, Mathias Viborg; Greer, Ross; Møgelmose, Andreas; Trivedi, Mohan M. · 2024 · Unknown

DOI: 10.1109/iv55156.2024.10588790

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of sensor frame rate mismatches in Advanced Driver Assistance Systems (ADAS), where high-frequency RGB cameras and low-frequency thermal cameras operate at different speeds. This discrepancy creates missing data gaps that hinder real-time, comprehensive driver state monitoring, particularly for dynamic events like hand movements or posture changes. The authors propose a general framework using conditional generative adversarial networks (cGANs) to synthesize realistic thermal images from RGB inputs, thereby creating "pseudo-complete" data streams that allow downstream models to utilize high-frequency temporal dynamics without missing thermal information. The study evaluates two cGAN architectures, pix2pix and CycleGAN, using a dataset of 17 subjects seated in a simulated driver’s seat. The dataset comprises synchronized RGB images (approx. 30 fps) and thermal images (<9 fps) captured from four perspectives: front, overhead, profile, and tablet. The researchers tested four input configurations: single front-view RGB, and three multi-view formats (tessellated and stacked four-view inputs). They also compared single-subject training against multi-subject training to assess model generalizability. Performance was measured using Average Test L1 Error on normalized pixel values. Results indicate that pix2pix significantly outperforms CycleGAN, with an average L1 error of 0.0676 compared to 0.2179. The authors attribute CycleGAN’s poor performance to its difficulty modeling the lossy translation from visible light to thermal imagery. Regarding input styles, multi-view inputs improved accuracy over single-view inputs, with the stacked four-view configuration achieving the lowest error (0.0559), suggesting that spatial relationships across views enhance generation quality. However, the study found that models trained on a single subject performed better (error 0.0676) than those trained on the aggregate multi-subject dataset (error 0.1116). The multi-subject training introduced confusion rather than generalized patterns, indicating that individualized training is currently necessary for optimal performance. The findings demonstrate that generative models can effectively address missing frame issues caused by sensor rate mismatches, enabling higher-frequency driver state monitoring. While the stacked multi-view approach proved most effective for synthesis, the poor generalization across subjects highlights a critical limitation. The authors conclude that future research must focus on improving model adaptability across diverse drivers, potentially through small-data fine-tuning, to enable the deployment of singular, robust models for intelligent vehicle applications.

Key finding

The pix2pix architecture trained on stacked multi-view RGB inputs from individual subjects produces the most accurate synthetic thermal images, significantly outperforming CycleGAN and multi-subject training approaches.

Methodology

lab_experiment

Sample size: 17

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via author_sweep_intake on 2026-05-28.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	author_sweep	—	—	2	2026-05-28
archive	success	unpaywall	—	—	2	2026-06-04
extract	success	cached	—	—	3	2026-06-10
clean	success	clean	—	—	1	2026-06-04
chunk	success	chunk	—	—	1	2026-06-04
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-04
enrich	success	—	—	—	1	2026-05-28
promote	success	—	—	—	1	2026-06-04
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	2	2026-06-10
tag	success	vector_similarity	—	—	15	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

distraction detection algorithms