A data–information–knowledge cycle for modeling driving behavior
DOI: 10.1016/j.trf.2021.12.017
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This paper addresses the need for a holistic framework to model driving behavior in the era of autonomous vehicles (AVs). As transportation shifts toward automation, understanding human-machine interactions and assessing the network-level impacts of AVs requires robust behavioral modeling. The authors identify a gap in existing literature, which typically treats data collection and information extraction as separate processes. To bridge this gap, the study proposes a "data–information–knowledge" cycle that integrates these steps, aiming to provide a comprehensive overview for planning future research and policy decisions regarding driving behavior modeling. The methodology involves an extensive literature review following PRISMA guidelines. The authors searched Scopus, Google Scholar, and IEEE Xplore using keywords such as "data collection," "information extraction," and "autonomous vehicles." From an initial pool of 161 studies, approximately 30 papers were selected for detailed analysis. The review was structured to answer three research questions: how driving behavior data is collected, how knowledge is extracted to model behavior, and how these components can be integrated into a unified cycle. The selected studies were categorized by vehicle mode and automation level, with a focus on naturalistic driving studies (NDS) and statistical learning methods. The findings detail the components of the data collection phase, highlighting the use of Data Acquisition Systems (DAS) equipped with sensors, GPS, radar, and cameras to capture vehicle kinematics, environmental context, and driver demographics. The review notes significant challenges in this phase, including data heterogeneity, massive data volumes (e.g., petabytes of data in large NDS), and synchronization issues. For information extraction, the paper identifies various statistical learning methods used to predict or mimic driving behavior. The authors synthesize these findings to develop a framework for data analytics and fusion, demonstrating its application through examples such as assessing AV impacts on network levels and evaluating user acceptance of automated driving systems. The significance of this work lies in its provision of a transferable, holistic framework that connects raw data collection to actionable knowledge. By integrating the challenges and methods of both data acquisition and information extraction, the paper offers a checklist for researchers planning new studies on driving behavior. This integrated approach facilitates better project planning, addresses data quality and privacy concerns early in the design phase, and supports the development of models that can be replicated at larger scales. The authors suggest that this framework can be extended across various transportation sectors, aiding in the broader understanding of human-machine interactions in automated environments.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | Crossref | — | — | 1 | 2026-06-20 |
| archive | success | openalex | — | — | 5 | 2026-06-26 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-20 |
| chunk | success | chunk | — | — | 1 | 2026-06-20 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-20 |
| enrich | success | openalex | — | — | 1 | 2026-06-20 |
| promote | success | — | — | — | 1 | 2026-06-20 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-20 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Methodological Resource: dataset resource, tool software
- Theoretical Contribution: computational model