Vehicle Consumer Complaint Reports Involving Severe Incidents: Mining Large Contingency Tables

Das, Subasish; Mudgal, Abhisek; Dutta, Anandi; Geedipally, Srinivas Reddy · 2018 · Transportation Research Record Journal of the Transportation Research Board

DOI: 10.1177/0361198118788464

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This study addresses the safety implications of pre-existing vehicle manufacturing defects, which contribute to approximately 6.35% of fatal crashes according to Fatality Analysis Reporting System (FARS) data. Motivated by the rising economic and social costs of traffic crashes and the limitations of conventional crash data in detailing vehicle defects, the research aims to identify latent trends in consumer complaints and detect clusters of vehicle models with high relative reporting ratios for severe incidents. The authors utilize the National Highway Traffic Safety Administration’s (NHTSA) vehicle complaint database, which contains over 1.37 million reports, focusing specifically on the 67,201 reports involving injury or fatalities. The methodology employs two primary analytical techniques: exploratory text mining and Empirical Bayes (EB) data mining. Text mining, including sentence stemming and comparison word clouds, was used to identify key risk factors and themes within complaint texts, contrasting data from NHTSA complaints with FARS data to mitigate reporting bias. EB data mining, specifically the EB geometric mean (EBGM) method, was applied to a large contingency table linking vehicle models (make, model, year) with specific component defects. This approach accounts for sampling variation and reporting bias by shrinking high relative reporting ratios that result from low expected counts, thereby identifying statistically significant associations between specific vehicle models and defect types. Results from the text mining analysis indicate that major vehicular defects are predominantly associated with air bags, brake systems, seat belts, speed controls, and tires/wheels. These five categories account for 62% to 77% of reports depending on the dataset subset. Comparison word clouds revealed that terms like "structure," "engine," and "visibility" were more dominant in crash-related complaints, whereas "air bags" and "speed" were common in both crash and non-crash reports. The EB analysis identified specific "vehicle model with major defect" groups with unusually high reporting rates. For instance, the combination of the 2002 Ford Explorer and structural body defects (hinge and attachments) had an EBGM score of 55.14, indicating it occurred 55 times more frequently than expected. Notably, most high-risk combinations involved older vehicle models, likely due to longer exposure time. The study concludes that consumer complaint data is a valuable resource for identifying major vehicular defects and specific high-risk vehicle model groups that require further investigation. By bridging the gap between consumer reports and fatal crash data, the findings offer insights into defect-related crash risks. The authors emphasize the significance of this research for enhancing traffic safety strategies, particularly in the context of advancing connected and automated vehicle technologies, where understanding mechanical failure modes is crucial. The study demonstrates that while vehicle defects are a contributing factor in a small percentage of crashes, targeted analysis of complaint data can reveal specific, actionable safety concerns.

Key finding

Major vehicular defects associated with severe incidents are concentrated in air bags, brake systems, seat belts, and speed controls, with empirical Bayes analysis identifying specific vehicle models with statistically significant high reporting ratios for these defects.

Methodology

dataset

Sample size: 67201

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via author_sweep_intake on 2026-05-28.

StageOutcomeToolModelPromptAttemptsCompleted
discover success author_sweep 2 2026-05-28
archive success canonical_url 7 2026-06-06
extract success cached 3 2026-06-10
clean success clean 1 2026-06-04
chunk success chunk 1 2026-06-04
embed success embed Qwen/Qwen3-Embedding-8B 1 2026-06-04
enrich skipped 3 2026-06-04
promote success 1 2026-06-04
summarize success llm qwen3.6-27b-prismaquant summ-v5 2 2026-06-10
tag success vector_similarity 15 2026-06-11
verify success 2 2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).