Assessing Connected Vehicle Data Coverage on New Jersey Roadways
DOI: 10.1109/icite56321.2022.10101453
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This study evaluates the spatial and temporal coverage of commercial connected vehicle data (CVD) to determine its suitability for transportation analytics. Specifically, the authors assess the market penetration rate (MPR) and consistency of Wejo Vehicle Movement data across the New Jersey roadway network. The motivation stems from the growing reliance on probe vehicle data for traffic monitoring, where the reliability of analytics depends on the sample size relative to total traffic flow. The study aims to verify whether this specific commercial dataset provides a sufficient and representative sample of vehicle volumes across different roadway functional classifications. The methodology involved comparing Wejo trajectory data against ground-truth traffic counts from 46 permanent weight-in-motion (WIM) stations operated by the New Jersey Department of Transportation. The Wejo dataset, collected between April 15 and June 13, 2021, contained approximately 17 billion trajectory points from 22.18 million journeys. The WIM stations covered four functional classes: interstate highways, other freeways/expressways, principal arterials, and minor arterials. To align the datasets, the authors geofenced each WIM station with a 100-meter buffer and used Apache Spark to process the massive volume of trajectory data. Market penetration was calculated as the percentage of Wejo-detected vehicles relative to total WIM counts for each hourly window. The analysis also examined spatial distribution via ping intervals and temporal variations across time-of-day and weekday/weekend periods. The results indicate that the average MPR ranges from 2.31% to 4.39%, depending on the roadway class. Interstate highways exhibited the lowest mean MPR (2.55%) but the highest consistency (standard deviation of 0.76%). Minor arterials showed the highest mean MPR (4.39%) but also the highest variability (standard deviation of 2.65%), likely due to lower total traffic volumes. Spatial analysis revealed that over 80% of ping intervals on interstate highways were under 60 seconds during daytime hours, indicating a uniform distribution of equipped vehicles. Temporally, the Wejo data closely tracked WIM patterns, with high linear correlations (e.g., $r^2 = 0.977$ for one station) between the two datasets across daily and time-of-day variations. Weekend MPRs were slightly higher than weekdays, particularly on interstates. The study concludes that despite an MPR below 5%, the Wejo data offers consistent spatial and temporal representation of traffic streams, making it viable for various traffic analytics applications. The authors note that previous research suggests MPRs as low as 1–2% are sufficient for estimating vehicle-miles traveled and signal performance measures. Consequently, this commercial CVD can serve as a cost-effective alternative to traditional traffic counts for tasks such as roadway risk profiling, incident monitoring, and driving behavior analysis, provided users account for variability in lower-volume roadways.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | Crossref | — | — | 1 | 2026-06-25 |
| archive | success | semantic_scholar | — | — | 6 | 2026-06-26 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-26 |
| chunk | success | chunk | — | — | 1 | 2026-06-26 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-26 |
| enrich | success | openalex | — | — | 1 | 2026-06-26 |
| promote | success | — | — | — | 1 | 2026-06-25 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-26 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Empirical Findings: observational prevalence
- Methodological Resource: dataset resource