Evaluation of outlier detection algorithms for traffic congestion assessment in smart city traffic data from vehicle sensors

Blázquez, Ramona Ruiz; Muñoz-Organero, Mario; Fernández, Luis Sánchez · 2018 · OpenAlex-citations

DOI: 10.1504/ijhvs.2018.10016106

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of automatically detecting traffic congestion in smart city environments using real-time telemetry data from vehicle sensors. The authors argue that while outlier detection is typically used to remove noise, it can also identify significant abnormal driving conditions, such as traffic jams caused by heavy traffic or accidents. The study aims to evaluate various multivariate outlier detection algorithms and combine them with classification methods to distinguish between general anomalies and specific congestion events, thereby optimizing transportation systems and reducing driver discomfort. The methodology utilizes data collected via an Android application called ‘Smart Driver,’ which captures second-by-second measurements from GPS and wearable heart rate sensors. The dataset includes twenty-five variables, such as velocity, acceleration, Positive Acceleration Kinetic Energy (PKE), and heart rate metrics (RR interval and pNN50). The experimental design consists of two phases. First, the authors analyze 2,183 observations from a single driver over nine days to evaluate the performance of several outlier detection techniques, including the Peña & Prieto algorithm, Minimum Covariance Determinant (MCD), Local Outlier Factor (LOF), k-means clustering, and One-Class Support Vector Machines (SVM). Second, they conduct a validation experiment using data from two drivers over 32 days, with five days explicitly labeled as containing traffic jams. In this phase, the output of the outlier detection stage serves as input for logistic regression and SVM classifiers to determine if detected outliers correspond to congestion. The results demonstrate that multiple outlier detection methods yield similar outcomes, identifying hundreds of multivariate outliers that are not apparent in univariate analysis. Specifically, the Peña & Prieto algorithm and MCD identified 461 and 484 outliers, respectively, while LOF and k-means produced comparable results despite the data not being normally distributed. In the classification phase, the SVM classifier achieved a 100% hit rate in distinguishing traffic jam days from normal days when using variables derived from outlier counts and velocity metrics. Logistic regression achieved a 93.75% hit rate. A subsequent test using only outlier-dependent variables (total outliers, maximum density, and bursts) resulted in a 96.88% accuracy for both classifiers. The study confirms a high correlation between the number of detected outliers, minimum velocity, and the presence of traffic congestion. The significance of this work lies in its demonstration that traffic congestion can be effectively identified by combining multivariate outlier detection with supervised classification of vehicle sensor data. The findings suggest that SVM classifiers slightly outperform logistic regression in this context and that clustering methods can be effective even when data distributions are non-normal. This approach provides a viable mechanism for real-time traffic monitoring and incident detection in smart cities. The authors conclude that future work should focus on expanding the dataset to include more drivers and road types, refining variable selection for better discrimination, and developing predictive algorithms to anticipate upcoming traffic incidents.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

StageOutcomeToolModelPromptAttemptsCompleted
discover success OpenAlex-citations 1 2026-06-20
archive success openalex 5 2026-06-26
extract success cached 2 2026-06-26
clean success clean 1 2026-06-20
chunk success chunk 1 2026-06-20
embed success embed Qwen/Qwen3-Embedding-8B 1 2026-06-20
promote success 1 2026-06-20
summarize success llm qwen3.6-27b-prismaquant summ-v5 1 2026-06-26
tag success vector_similarity 6 2026-06-20
verify success 1 2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.