Analysis of SHRP2 Speeding Data: Methods Used to Conduct the Research
archive: archived pipeline: cataloged verified
Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)
Summary
This report details the methodologies and datasets used to analyze speeding behavior within the Strategic Highway Research Program 2 (SHRP2) Naturalistic Driving Study (NDS). The primary objective was to prepare data reductions that supported subsequent analyses of driver speeding, specifically focusing on extracting free-flow episodes (FFEs) and speeding episodes (SEs). The document serves as a technical guide for researchers navigating the complex process of SHRP2 data acquisition, management, and processing. The study prioritized obtaining a broad sample of drivers across six data collection sites rather than a large number of trips per driver, leveraging the unprecedented scale of the SHRP2 repository. The research employed a three-component workflow: data management, data acquisition, and data processing. Data management involved storing raw and intermediate data in a Relational Database Management System (RDBMS), ensuring security, and maintaining data quality. Data acquisition required identifying variables of interest, establishing data sharing agreements, and retrieving time-series data from the Virginia Tech Transportation Institute (VTTI) alongside roadway information from the Roadway Information Database (RID). Data processing utilized custom-developed software tools to automate the cleaning and parsing of high-frequency (1-Hz) vehicle speed recordings. The process included ingesting trip time-series data, calculating point geometry and linear referencing system measures, identifying posted speed limits, and handling missing or erroneous GPS coordinates. The data processing pipeline parsed cleaned time-series data into Trips, FFEs, and SEs. FFEs were defined as periods of travel at speeds greater than 5 mph below the posted speed limit, representing opportunities to speed. SEs were defined as periods exceeding the speeding threshold of 10 mph above the posted speed limit. The software calculated descriptive statistics for these episodes to characterize driving behavior. The project followed a two-phase approach: a test phase using data from Washington and Pennsylvania sites to debug and validate tools, followed by an analysis phase processing data from all six sites. Quality testing and validation were integrated throughout the process to address issues such as missing route IDs, incorrect LinkIDs, and consecutive duplicate values. The results yielded final data reductions containing counts of trips, FFEs, and SEs suitable for statistical analysis. The report highlights significant lessons learned regarding variable selection, data request protocols, and the challenges of working with naturalistic driving data, such as interpreting RID variables and managing data scope. The authors emphasize the necessity of automated processing tools due to the volume of data and provide insights into improving data cleaning and functionality for future research. This work establishes a reproducible framework for extracting and analyzing speeding behavior from large-scale naturalistic driving datasets, facilitating deeper understanding of driver behavior and safety implications.
Key finding
The research established a validated workflow for processing SHRP2 naturalistic driving data to extract and characterize speeding episodes and free-flow conditions across multiple data collection sites.
Methodology
dataset
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via bulk_ingest_rosap on 2026-05-23 (6 acquisition events logged).
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | rosap | — | — | 2 | 2026-05-23 |
| archive | success | — | — | — | 1 | 2026-05-23 |
| extract | success | cached | — | — | 2 | 2026-06-10 |
| clean | success | — | — | — | 1 | 2026-06-01 |
| chunk | success | — | — | — | 1 | 2026-06-01 |
| embed | success | — | — | — | 1 | 2026-06-02 |
| enrich | success | — | — | — | 1 | 2026-05-23 |
| promote | success | — | — | — | 1 | 2026-05-23 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 3 | 2026-06-10 |
| tag | success | vector_similarity | — | — | 19 | 2026-06-11 |
| verify | success | — | — | — | 2 | 2026-06-10 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Empirical Findings: crash risk outcomes, observational prevalence
- Methodological Resource: dataset resource