A data–information–knowledge cycle for modeling driving behavior

Al Haddad, Christelle; Antoniou, Constantinos · 2022 · Crossref

DOI: 10.1016/j.trf.2021.12.017

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the need for a holistic framework to model driving behavior in the era of autonomous vehicles (AVs). As transportation shifts toward automation, understanding human-machine interactions and assessing the network-level impacts of AVs requires robust behavioral modeling. The authors identify a gap in existing literature, which typically treats data collection and information extraction as separate processes. To bridge this gap, the study proposes a "data–information–knowledge" cycle that integrates these steps, aiming to provide a comprehensive overview for planning future research and policy decisions regarding driving behavior modeling. The methodology involves an extensive literature review following PRISMA guidelines. The authors searched Scopus, Google Scholar, and IEEE Xplore using keywords such as "data collection," "information extraction," and "autonomous vehicles." From an initial pool of 161 studies, approximately 30 papers were selected for detailed analysis. The review was structured to answer three research questions: how driving behavior data is collected, how knowledge is extracted to model behavior, and how these components can be integrated into a unified cycle. The selected studies were categorized by vehicle mode and automation level, with a focus on naturalistic driving studies (NDS) and statistical learning methods. The findings detail the components of the data collection phase, highlighting the use of Data Acquisition Systems (DAS) equipped with sensors, GPS, radar, and cameras to capture vehicle kinematics, environmental context, and driver demographics. The review notes significant challenges in this phase, including data heterogeneity, massive data volumes (e.g., petabytes of data in large NDS), and synchronization issues. For information extraction, the paper identifies various statistical learning methods used to predict or mimic driving behavior. The authors synthesize these findings to develop a framework for data analytics and fusion, demonstrating its application through examples such as assessing AV impacts on network levels and evaluating user acceptance of automated driving systems. The significance of this work lies in its provision of a transferable, holistic framework that connects raw data collection to actionable knowledge. By integrating the challenges and methods of both data acquisition and information extraction, the paper offers a checklist for researchers planning new studies on driving behavior. This integrated approach facilitates better project planning, addresses data quality and privacy concerns early in the design phase, and supports the development of models that can be replicated at larger scales. The authors suggest that this framework can be extended across various transportation sectors, aiding in the broader understanding of human-machine interactions in automated environments.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	Crossref	—	—	1	2026-06-20
archive	success	openalex	—	—	5	2026-06-26
extract	success	cached	—	—	2	2026-06-26
clean	success	clean	—	—	1	2026-06-20
chunk	success	chunk	—	—	1	2026-06-20
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-20
enrich	success	openalex	—	—	1	2026-06-20
promote	success	—	—	—	1	2026-06-20
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-26
tag	success	vector_similarity	—	—	6	2026-06-20
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Methodological Resource: dataset resource, tool software
Theoretical Contribution: computational model