Application of Extremely Randomised Trees for exploring influential factors on variant crash severity data

Afshar, Farshid; Seyedabrishami, Seyedehsan; Moridpour, Sara · 2022 · Crossref

DOI: 10.1038/s41598-022-15693-7

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This study addresses the challenge of identifying influential factors in traffic crash severity, specifically within rural areas of Khorasan Razavi, Iran, a region with high casualty rates. Traditional statistical models often rely on linear assumptions and static data, limiting their ability to handle the complex, imbalanced, and variant nature of crash data. To overcome these limitations, the authors apply Extremely Randomised Trees (ERT), a machine learning ensemble technique, to analyze crash severity. The motivation stems from the need for more robust models that can capture non-linear relationships and handle high variance in data, such as driver behavior and environmental conditions, which traditional methods may overlook. The research utilizes a dataset of 1,427 crashes recorded between 2013 and 2017, sourced from police reports and real-time traffic data collected via inductive loop detectors. Crash severity was categorized into three ordered levels: Property Damage Only (PDO), injury, and fatal. The model incorporated 31 variables, including traffic conditions (speed, flow), vehicle specifications, movement types, land use, temporal characteristics, and environmental factors. The ERT model was developed using Classification and Regression Trees (CART) as base learners, with parameters tuned via grid search and cross-validation. The dataset was split into 80% training and 20% testing sets. Model performance was evaluated using accuracy, precision, recall, and F-measure. To interpret the results, the authors employed Feature Importance Analysis (FIA), Partial Dependence Plots (PDP), and Individual Conditional Expectation (ICE) plots. The findings indicate that the involvement of vulnerable road users, particularly motorcyclists and pedestrians, alongside specific traffic variables, are the most significant predictors of crash severity. The analysis revealed that the presence of motorcycles increases the probability of injury crashes by approximately 30% and nearly doubles the probability of fatal crashes. Furthermore, interaction analyses using PDPs showed that driving speeds exceeding 60 km/h in residential areas raise the probability of injury crashes by about 10%. Additionally, at speeds higher than 70 km/h, the presence of pedestrians increases the probability of fatal crashes by approximately 6%. The ERT model effectively handled the imbalanced nature of the data, where fatal crashes constituted only 4% of the observations. The significance of this study lies in its application of ERT to rural crash severity analysis, a method not previously utilized in this context. By leveraging real-time traffic data and advanced machine learning techniques, the research provides a more nuanced understanding of crash dynamics compared to traditional statistical approaches. The results highlight the critical role of speed and vulnerable road user presence in determining severity, offering actionable insights for traffic safety interventions. This approach demonstrates the potential of ensemble learning methods to improve prediction accuracy and interpretability in transportation safety research, particularly in developing countries with limited real-time data infrastructure.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

StageOutcomeToolModelPromptAttemptsCompleted
discover success Crossref 1 2026-06-25
archive success canonical_url 1 2026-06-26
extract success cached 2 2026-06-26
clean success clean 1 2026-06-25
chunk success chunk 1 2026-06-25
embed success embed Qwen/Qwen3-Embedding-8B 1 2026-06-25
promote success 1 2026-06-25
summarize success llm qwen3.6-27b-prismaquant summ-v5 1 2026-06-26
tag success vector_similarity 6 2026-06-25
verify success 1 2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).