Automated Machine Learning Pipeline for Traffic Count Prediction
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This study addresses the challenge of accurately predicting long-term traffic volumes, a critical task for traffic management and resource allocation. The authors identify a gap in existing literature, noting that previous studies often rely on limited datasets, selective predictors, and single algorithms, leading to inconsistent results regarding whether linear or nonlinear models are superior. To resolve this, the paper proposes a universal automated machine learning framework capable of identifying the optimal feature-selection method and modeling approach for any given dataset, thereby reducing the mean absolute percentage error (MAPE) and minimizing the expertise required for forecasting. The methodology employs a comprehensive dataset from the Florida Department of Transportation, covering historical traffic counts for passenger vehicles across 259 co-sites on six interstate highways between 2001 and 2017. This dataset comprises 52,836 data points and utilizes 59 predictors, including socioeconomic factors, road characteristics, and financial market indicators. The automated pipeline integrates robust cross-validation and hyperparameter optimization via grid search to evaluate five linear and four nonlinear algorithms. This process automatically selects the best modeling approach and feature-selection strategy specific to the data, distinguishing it from static model optimization techniques tied to single case studies. The results demonstrate that nonlinear models consistently outperformed linear models in predicting monthly traffic volumes for the Florida case study. The automated framework successfully identified the most appropriate algorithms and features to minimize prediction errors, validating its effectiveness in handling the complex, dynamic nature of traffic flow. By leveraging a broad set of predictors and comparing multiple algorithmic approaches, the study confirms that the relationship between local/global variables and traffic flow is often nonlinear, necessitating more sophisticated modeling than traditional econometric or linear regression methods. The significance of this work lies in the development of a customizable, universal framework that transportation planners can apply to various projects. Unlike traditional travel-demand models, which are time-consuming and require substantial data collection resources, this automated pipeline allows users to incorporate local data and predictors to generate optimized, project-specific models. This capability aids policymakers in identifying critical road links prone to overcapacity and congestion, facilitating more efficient allocation of limited resources for highway expansion and intelligent traffic management services.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | Crossref | — | — | 1 | 2026-06-18 |
| archive | success | openalex | — | — | 5 | 2026-06-25 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-18 |
| chunk | success | chunk | — | — | 1 | 2026-06-18 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-18 |
| promote | success | — | — | — | 1 | 2026-06-18 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-18 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.