Using CHAID Decision Trees to Evaluate Severities of Missouri Truck Crashes
archive: archived pipeline: cataloged verified
Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)
Summary
This study investigates the factors influencing the severity of large-truck crashes in Missouri, specifically examining how these factors vary by driver gender. The research is motivated by the high frequency of fatalities and injuries associated with large-truck transport and existing literature suggesting that gender influences crash risk and injury outcomes. With a significant shortage of truck drivers prompting increased recruitment, the authors aim to identify gender-specific behavioral predictors to enhance training programs and safety standards. The researchers utilized crash data from the Missouri State Highway Patrol spanning 2002 to 2012, focusing on incidents involving Commercial Driver’s License (CDL) holders. The dataset included 52,651 crashes involving Missouri-licensed drivers, categorized by injury severity (fatality, injury, or property damage) and contributing circumstances such as speeding, distraction, and vehicle defects. To analyze the data, the authors employed Chi-squared Automatic Interaction Detection (CHAID) decision trees, partitioning the dataset by gender. The models were built with specific stopping criteria and validated using a 75/25 training/testing split, achieving classification accuracies of 80.36% for male drivers and 79.05% for female drivers. The results revealed distinct predictors for crash severity based on gender. For male drivers, the most significant predictors were driving too fast for conditions, driver fatigue, failing to yield, and following too closely. The decision tree for males exhibited complex interaction effects; for instance, combining "driving too fast for conditions" with "distraction/inattention" increased the probability of injury to 42.2%, while driving too fast on the wrong side of the road raised the fatality probability to 8.0%. In contrast, female drivers showed no interaction effects between factors. Their primary predictors were following too closely, physical impairment, improper passing, and failing to yield. For females, driving too fast for conditions alone resulted in a 39.6% probability of injury, while driving on the wrong side of the road or having physical impairment led to fatality probabilities of 7.7% and 7.1%, respectively. The study concludes that truck driver training should be tailored to address these gender-specific behavioral risks. Training for male drivers should emphasize speed management and fatigue prevention, while training for female drivers should focus on maintaining proper following distances. The authors suggest that customizing curricula based on these statistical predictors can improve driver safety and reduce crash severity. Limitations include a small sample size for female drivers, and future research is recommended to combine datasets across states and incorporate environmental factors like weather and road conditions to further refine safety recommendations.
Key finding
Male CDL drivers' crash severity is primarily predicted by driving too fast for conditions and driver fatigue with significant interaction effects, while female CDL drivers' severity is predicted by following too closely and physical impairment with no significant interaction effects.
Methodology
dataset
Sample size: 52645
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via bulk_ingest_rosap on 2026-05-23 (6 acquisition events logged).
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | rosap | — | — | 2 | 2026-05-23 |
| archive | success | — | — | — | 1 | 2026-05-23 |
| extract | success | cached | — | — | 2 | 2026-06-10 |
| clean | success | — | — | — | 1 | 2026-06-01 |
| chunk | success | — | — | — | 1 | 2026-06-01 |
| embed | success | — | — | — | 1 | 2026-06-02 |
| enrich | success | — | — | — | 1 | 2026-05-23 |
| promote | success | — | — | — | 1 | 2026-05-23 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 3 | 2026-06-10 |
| tag | success | vector_similarity | — | — | 19 | 2026-06-11 |
| verify | success | — | — | — | 2 | 2026-06-10 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.
- sex gender
- pre crash contributing factors
- demographic disparities
- incidence prevalence
- causation analyses
- naturalistic crash near crash
Information type
What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).
- Empirical Findings: crash risk outcomes
- Methodological Resource: dataset resource