Pilot Study on Improving Crash Data Accuracy in Kentucky through University Collaboration

Fields, Michael A.; Green, Eric; Kluger, Robert; Zhang, Xu; Haleem, Kirolos · 2024 · ROSA P / Kentucky Transportation Cabinet

archive: archived pipeline: cataloged verified

Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)

Summary

This pilot study addresses the critical challenge of maintaining high-quality crash data in Kentucky, where over 150,000 crash reports are generated annually. Accurate data is essential for evidence-based road safety strategies, yet manual review of all reports is impractical. The research specifically targets discrepancies between structured coded data and unstructured crash narratives written by law enforcement officers. The study aims to develop efficient methods for identifying these inconsistencies and to evaluate machine learning algorithms that can automate the interpretation of narrative text to improve data accuracy. Researchers from the University of Kentucky, University of Louisville, and Western Kentucky University collaborated to create a web-based Quality Control Tool (QCT) to facilitate manual reviews. The study analyzed a sample of approximately 8,000 crash narratives from calendar year 2020, excluding fatal and property-damage-only crashes. Seven trained student reviewers and one faculty member used the QCT to assess 20 specific crash attributes, such as travel direction, aggressive driving, and intersection status, against the narrative text. The tool tracked review times and responses, allowing researchers to validate inter-reviewer consistency and identify which attributes were most difficult to verify. Following the manual review phase, the team developed a logistic regression model in Python to automatically classify crash attributes based on narrative text. This proprietary model was compared against Google’s BERT (Bidirectional Encoder Representations from Transformers) AI language model to determine which algorithm offered superior performance in text mining. The manual review revealed significant inconsistencies between narratives and coded data, particularly regarding aggressive driving, distracted driving, intersection crashes, secondary crashes, and travel direction. Reviewers found it difficult to validate distracted driving instances but could determine the manner of collision in 99% of narratives. Average review times ranged from 1.57 to 3.51 minutes per report, with student reviewers averaging 2.8 minutes. Inter-reviewer agreement was generally high among students but lower when compared to the faculty reviewer. In the machine learning comparison, while the proprietary logistic regression model produced broadly similar results to BERT, the BERT model demonstrated superior accuracy, precision, and recall across all goodness-of-fit metrics. The study concludes that while manual reviews are valuable for identifying specific data quality issues, they are not scalable for the full volume of crash reports. The findings suggest that future crash data quality control efforts should prioritize the use of advanced AI models like BERT for automated narrative interpretation, as they outperform traditional logistic regression approaches. However, the authors note that continuous integration of the latest AI advancements is necessary to maintain model efficacy. These results provide a framework for transportation agencies to systematically improve crash reporting accuracy and support more reliable safety analysis.

Key finding

Google's BERT language model significantly outperformed a custom logistic regression model in accurately classifying crash attributes from unstructured narrative text.

Methodology

dataset

Sample size: 8000

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via bulk_ingest_rosap on 2026-05-23 (6 acquisition events logged).

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	rosap	—	—	2	2026-05-23
archive	success	—	—	—	1	2026-05-23
extract	success	cached	—	—	2	2026-06-10
clean	success	—	—	—	1	2026-06-01
chunk	success	—	—	—	1	2026-06-01
embed	success	—	—	—	1	2026-06-02
enrich	success	—	—	—	1	2026-05-23
promote	success	—	—	—	1	2026-05-23
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	3	2026-06-10
tag	success	vector_similarity	—	—	19	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Empirical Findings: crash risk outcomes
Methodological Resource: dataset resource