Development of a prediction model for crash occurrence by analyzing traffic crash and citation data : final report.
archive: archived pipeline: cataloged verified
Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)
Summary
This study addresses the critical need for improved highway safety by developing a statistical model to estimate the likelihood of a driver being involved in a traffic crash. Motivated by the global burden of road traffic injuries and deaths, particularly among young adults, the research focuses on human factors, which are widely acknowledged as primary contributors to crash occurrence. Specifically, the project investigates how driver characteristics—such as age, gender, and driving experience—and historical records of traffic violations and prior crashes influence future crash involvement. The goal was to create a predictive tool that could assist in identifying high-risk drivers and informing safety interventions. Due to limited access to official driver record databases in Puerto Rico, the researchers collected primary data through a survey administered to a sample of the local driving population. The survey, distributed via online platforms and paper forms, gathered information on demographics, years of driving experience, daily driving hours, and detailed histories of traffic violations and crashes. After filtering for incomplete responses, a final dataset of 952 participants was analyzed. The methodology involved preliminary analyses using contingency tables and chi-square tests of independence to assess associations between categorical variables and crash involvement. Subsequently, the researchers employed stepwise multiple logistic regression to develop and assess a prediction model, utilizing Minitab software for statistical processing. The analysis revealed that years of driving experience, gender, and traffic violation history are significantly associated with the likelihood of being involved in a vehicle crash. The study categorized violations into moving and non-moving types and examined various specific infractions, such as speeding, driving under the influence, and illegal lane changes. The final logistic regression model, refined through a stepwise backward elimination procedure, identified these key predictors as statistically significant factors. The model’s performance was evaluated using the Hosmer-Lemeshow test and Receiver Operating Characteristic (ROC) curves, confirming its validity in distinguishing between drivers who had been involved in crashes and those who had not. The significance of this work lies in its contribution to the field of transportation safety by providing a validated statistical framework for crash prediction based on accessible driver data. By demonstrating that specific human factors and violation histories are strong predictors of future crash involvement, the findings support the development of targeted enforcement strategies and driver education programs. The study underscores the importance of integrating human factor analysis into highway safety planning, offering a methodological approach that can be adapted to other regions where official traffic databases may be inaccessible or incomplete.
Key finding
Years of driving experience, gender, and traffic violation history are significantly associated with the likelihood of a driver being involved in a vehicle crash.
Methodology
survey
Sample size: 952
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via bulk_ingest_rosap on 2026-05-23 (6 acquisition events logged).
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | rosap | — | — | 2 | 2026-05-23 |
| archive | success | — | — | — | 1 | 2026-05-23 |
| extract | success | cached | — | — | 2 | 2026-06-10 |
| clean | success | — | — | — | 1 | 2026-06-01 |
| chunk | success | — | — | — | 1 | 2026-06-01 |
| embed | success | — | — | — | 1 | 2026-06-02 |
| enrich | success | — | — | — | 1 | 2026-05-23 |
| promote | success | — | — | — | 1 | 2026-05-23 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 3 | 2026-06-10 |
| tag | success | vector_similarity | — | — | 19 | 2026-06-11 |
| verify | partial | — | — | — | 2 | 2026-06-10 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified_with_issues.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
- sex gender
- incidence prevalence
- telematics crash prediction
- induced exposure
- causation analyses
- demographic disparities
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Empirical Findings: crash risk outcomes, observational prevalence
- Methodological Resource: dataset resource