Assessing the effectiveness of data mining tools in classifying and predicting road traffic congestion
DOI: 10.11591/ijeecs.v34.i2.pp1295-1303
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This study addresses the challenge of predicting road traffic congestion, a significant issue impacting urban environments, commuters, and economies. While machine learning (ML) offers potential for accurate short-term forecasting, selecting the optimal data mining tool and classifier remains difficult. The research specifically compares the effectiveness of two open-source data mining tools, WEKA and Orange, in classifying traffic conditions as either normal or congested. The motivation stems from the need to identify which platform provides superior predictive performance for traffic management systems, particularly in the context of Amman, Jordan. The methodology utilizes a dataset provided by the Greater Amman Municipality for King Abdullah Street in 2018. The data includes hourly traffic volume, flow, capacity, number of lanes, and road width, collected via sensors. After preprocessing to remove duplicates and correct structural issues, the dataset was used to train and test four classification algorithms: Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). The models were evaluated using 10-fold cross-validation with a 70/30 train-test split. Performance was assessed using accuracy, sensitivity, precision, and F-measure derived from confusion matrices. The results demonstrate that the Orange data mining tool generally outperformed WEKA. Using Orange, RF and LR achieved 100% accuracy, while KNN reached 99.8% and SVM achieved 99.1%. In contrast, WEKA yielded lower performance for most classifiers: SVM achieved the highest accuracy at 99.7%, followed by KNN at 98.7%, LR at 97.6%, and RF at 96.2%. Across all metrics, including sensitivity and precision, Orange consistently produced higher or equal scores compared to WEKA. The study notes that these results surpass the accuracy rates reported in previous literature, which ranged from 84% to 95%. The significance of this work lies in its empirical comparison of widely used data mining platforms for traffic prediction. The findings suggest that Orange is the more effective tool for this specific application, offering superior accuracy and reliability in classifying traffic congestion. This conclusion aids researchers and engineers in selecting appropriate software for developing intelligent transportation systems. By validating high-accuracy predictions using local data, the study supports the integration of ML-based tools into urban traffic management to optimize flow and reduce congestion-related inefficiencies.
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | Crossref | — | — | 1 | 2026-06-20 |
| archive | success | canonical_url | — | — | 1 | 2026-06-26 |
| extract | success | cached | — | — | 2 | 2026-06-26 |
| clean | success | clean | — | — | 1 | 2026-06-20 |
| chunk | success | chunk | — | — | 1 | 2026-06-20 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-20 |
| promote | success | — | — | — | 1 | 2026-06-20 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 1 | 2026-06-26 |
| tag | success | vector_similarity | — | — | 6 | 2026-06-20 |
| verify | success | — | — | — | 1 | 2026-06-26 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.