How Does Variation in AI Performance Affect Trust in AI-infused Systems: A Case Study With In-Vehicle Systems
DOI: 10.1177/10711813241274423
archive: archived pipeline: cataloged verified
Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)
Summary
This study investigates how variations in the performance of AI-infused systems (AIS), specifically in-vehicle voice control systems (VCS), influence user trust. The research is motivated by the prevalence of Over-The-Air (OTA) updates, which can cause unstable system performance due to algorithmic uncertainty or temporary downgrades in specific functions. While prior research has focused on trust dynamics in safety-critical systems or those with infrequent failures, this study addresses low-risk, high-frequency interaction systems where trust is built over extended periods. The authors aim to determine the relationship between perceived system reliability and trust, and to understand how performance fluctuations during system evolution affect user attitudes. To address these questions, the researchers employed a Wizard of Oz simulation involving 27 participants. The experiment simulated a VCS in a smart cabin environment, where an experimenter manually played pre-recorded responses to participant queries to precisely control the Actual Correct Rate (ACR). Participants interacted with the system in three batches of 10 queries each, with simulated system upgrades occurring between batches. The ACR was manipulated across three levels (50%, 70%, and 90%), creating 27 possible performance trajectories. After each batch, participants reported their trust in the system (on a 1–7 scale) and their Perceived Correct Rate (PCR). Statistical analysis utilized four mixed linear models to examine how current and historical performance variations influenced PCR and trust, accounting for individual differences in initial trust propensity. The results indicate that users’ PCR is significantly influenced by the current system’s actual performance but not by previous versions’ performance. Crucially, user trust in the current system version is positively associated with the current PCR, rather than the actual performance itself. The study found that the pattern of system evolution impacts trust, but this effect diminishes over time. Specifically, while the change in PCR between the current and previous versions interacts with the previous PCR to influence trust, the performance of the first version had no significant effect on trust in the third version. This suggests that users’ impressions of earlier system states fade, likely due to working memory limitations, meaning recent performance experiences carry more weight than historical ones. The significance of these findings lies in their implications for the design and management of AI-infused systems. The study underscores that managing user perception of performance is critical for maintaining trust, as perceived accuracy outweighs actual accuracy in shaping user attitudes. For system designers, particularly in smart cabins, these results suggest that strategies for OTA updates should consider the marginal effects of performance changes relative to previous versions. Designers may need to balance optimization costs against the risk of diminishing trust through large performance variations. Furthermore, the finding that historical performance effects fade over time suggests that recent interactions are paramount for trust calibration. The authors note limitations regarding the remote, non-safety-critical nature of the study and recommend future validation in broader contexts, such as driving automation.
Key finding
User trust in AI-infused systems is determined by perceived performance rather than actual performance, and the influence of past performance variations on trust fades over time.
Methodology
simulation_modeling
Sample size: 27
Provenance
The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.
| Stage | Outcome | Tool | Model | Prompt | Attempts | Completed |
|---|---|---|---|---|---|---|
| discover | success | — | — | — | 1 | 2026-05-28 |
| archive | success | canonical_url | — | — | 1 | 2026-06-06 |
| extract | success | cached | — | — | 3 | 2026-06-10 |
| clean | success | clean | — | — | 1 | 2026-06-04 |
| chunk | success | chunk | — | — | 1 | 2026-06-04 |
| embed | success | embed | Qwen/Qwen3-Embedding-8B | — | 1 | 2026-06-04 |
| enrich | skipped | — | — | — | 3 | 2026-06-04 |
| promote | success | — | — | — | 1 | 2026-06-04 |
| summarize | success | llm | qwen3.6-27b-prismaquant | summ-v5 | 2 | 2026-06-10 |
| tag | success | vector_similarity | — | — | 15 | 2026-06-11 |
| verify | success | — | — | — | 2 | 2026-06-10 |
Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.
Topics
Ranked by relevance to this paper. Hover a topic for its definition.