How Does Variation in AI Performance Affect Trust in AI-infused Systems: A Case Study With In-Vehicle Systems

Gu, Feiqi; Xu, Haosong; He, Dengbo · 2024 · Proceedings of the Human Factors and Ergonomics Society Annual Meeting

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This study investigates how variations in the performance of AI-infused systems (AIS), specifically in-vehicle voice control systems (VCS), influence user trust. The research is motivated by the prevalence of Over-The-Air (OTA) updates, which can cause unstable system performance due to algorithmic uncertainty or temporary downgrades in specific functions. While prior research has focused on trust dynamics in safety-critical systems or those with infrequent failures, this study addresses low-risk, high-frequency interaction systems where trust is built over extended periods. The authors aim to determine the relationship between perceived system reliability and trust, and to understand how performance fluctuations during system evolution affect user attitudes. To address these questions, the researchers employed a Wizard of Oz simulation involving 27 participants. The experiment simulated a VCS in a smart cabin environment, where an experimenter manually played pre-recorded responses to participant queries to precisely control the Actual Correct Rate (ACR). Participants interacted with the system in three batches of 10 queries each, with simulated system upgrades occurring between batches. The ACR was manipulated across three levels (50%, 70%, and 90%), creating 27 possible performance trajectories. After each batch, participants reported their trust in the system (on a 1–7 scale) and their Perceived Correct Rate (PCR). Statistical analysis utilized four mixed linear models to examine how current and historical performance variations influenced PCR and trust, accounting for individual differences in initial trust propensity. The results indicate that users’ PCR is significantly influenced by the current system’s actual performance but not by previous versions’ performance. Crucially, user trust in the current system version is positively associated with the current PCR, rather than the actual performance itself. The study found that the pattern of system evolution impacts trust, but this effect diminishes over time. Specifically, while the change in PCR between the current and previous versions interacts with the previous PCR to influence trust, the performance of the first version had no significant effect on trust in the third version. This suggests that users’ impressions of earlier system states fade, likely due to working memory limitations, meaning recent performance experiences carry more weight than historical ones. The significance of these findings lies in their implications for the design and management of AI-infused systems. The study underscores that managing user perception of performance is critical for maintaining trust, as perceived accuracy outweighs actual accuracy in shaping user attitudes. For system designers, particularly in smart cabins, these results suggest that strategies for OTA updates should consider the marginal effects of performance changes relative to previous versions. Designers may need to balance optimization costs against the risk of diminishing trust through large performance variations. Furthermore, the finding that historical performance effects fade over time suggests that recent interactions are paramount for trust calibration. The authors note limitations regarding the remote, non-safety-critical nature of the study and recommend future validation in broader contexts, such as driving automation.

Key finding

User trust in AI-infused systems is determined by perceived performance rather than actual performance, and the influence of past performance variations on trust fades over time.

Methodology

simulation_modeling

Sample size: 27

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	—	—	—	1	2026-05-28
archive	success	canonical_url	—	—	1	2026-06-06
extract	success	cached	—	—	3	2026-06-10
clean	success	clean	—	—	1	2026-06-04
chunk	success	chunk	—	—	1	2026-06-04
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-04
enrich	skipped	—	—	—	3	2026-06-04
promote	success	—	—	—	1	2026-06-04
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	2	2026-06-10
tag	success	vector_similarity	—	—	15	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.