Development of a prediction model for crash occurrence by analyzing traffic crash and citation data : final report.

Gonzalez-Velez, Enrique; Gonzalez-Bonilla, Armando · 2017 · ROSA P / Transportation Informatics University Transportation Center

archive: archived pipeline: cataloged verified

Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)

Summary

This study addresses the critical need for improved highway safety by developing a statistical model to estimate the likelihood of a driver being involved in a traffic crash. Motivated by the global burden of road traffic injuries and deaths, particularly among young adults, the research focuses on human factors, which are widely acknowledged as primary contributors to crash occurrence. Specifically, the project investigates how driver characteristics—such as age, gender, and driving experience—and historical records of traffic violations and prior crashes influence future crash involvement. The goal was to create a predictive tool that could assist in identifying high-risk drivers and informing safety interventions. Due to limited access to official driver record databases in Puerto Rico, the researchers collected primary data through a survey administered to a sample of the local driving population. The survey, distributed via online platforms and paper forms, gathered information on demographics, years of driving experience, daily driving hours, and detailed histories of traffic violations and crashes. After filtering for incomplete responses, a final dataset of 952 participants was analyzed. The methodology involved preliminary analyses using contingency tables and chi-square tests of independence to assess associations between categorical variables and crash involvement. Subsequently, the researchers employed stepwise multiple logistic regression to develop and assess a prediction model, utilizing Minitab software for statistical processing. The analysis revealed that years of driving experience, gender, and traffic violation history are significantly associated with the likelihood of being involved in a vehicle crash. The study categorized violations into moving and non-moving types and examined various specific infractions, such as speeding, driving under the influence, and illegal lane changes. The final logistic regression model, refined through a stepwise backward elimination procedure, identified these key predictors as statistically significant factors. The model’s performance was evaluated using the Hosmer-Lemeshow test and Receiver Operating Characteristic (ROC) curves, confirming its validity in distinguishing between drivers who had been involved in crashes and those who had not. The significance of this work lies in its contribution to the field of transportation safety by providing a validated statistical framework for crash prediction based on accessible driver data. By demonstrating that specific human factors and violation histories are strong predictors of future crash involvement, the findings support the development of targeted enforcement strategies and driver education programs. The study underscores the importance of integrating human factor analysis into highway safety planning, offering a methodological approach that can be adapted to other regions where official traffic databases may be inaccessible or incomplete.

Key finding

Years of driving experience, gender, and traffic violation history are significantly associated with the likelihood of a driver being involved in a vehicle crash.

Methodology

survey

Sample size: 952

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via bulk_ingest_rosap on 2026-05-23 (6 acquisition events logged).

StageOutcomeToolModelPromptAttemptsCompleted
discover success rosap 2 2026-05-23
archive success 1 2026-05-23
extract success cached 2 2026-06-10
clean success 1 2026-06-01
chunk success 1 2026-06-01
embed success 1 2026-06-02
enrich success 1 2026-05-23
promote success 1 2026-05-23
summarize success llm qwen3.6-27b-prismaquant summ-v5 3 2026-06-10
tag success vector_similarity 19 2026-06-11
verify partial 2 2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified_with_issues.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).