Low-Cost Multi-Agent Navigation via Reinforcement Learning With Multi-Fidelity Simulator

Qiu, Jiantao; Yu, Chao; Liu, Weiling; Yang, Tianxiang; Yu, Jincheng; Wang, Yu; Yang, Huazhong · 2021 · DOAJ

DOI: 10.1109/ACCESS.2021.3085328

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the high computational cost associated with training multi-agent reinforcement learning (MARL) algorithms for navigation tasks using high-fidelity simulators. While high-fidelity simulators are necessary to bridge the gap between simulation and real-world application, their intensive resource consumption creates a bottleneck for model-free RL algorithms, which require vast amounts of data. The authors propose the Multi-Fidelity Simulator framework for Multi-Agent Reinforcement Learning (MFS-MARL) to reduce total data costs by leveraging samples from a low-fidelity simulator. The proposed framework integrates a high-fidelity simulator (HF-Sim) for primary training with a low-fidelity simulator (LF-Sim) to solve local navigation difficulties. The system employs a local puzzle detector that identifies when agents collide or become stuck in the HF-Sim. Upon detection, the state information is transferred to the LF-Sim, where a depth-first search (DFS) algorithm generates a local feasible policy. This expert policy is then combined with the original RL policy via a policy mixer to guide the agents out of the collision state in the HF-Sim, thereby improving exploration efficiency. The method utilizes a backtrack-based local Markov decision process decomposition to limit the search space, ensuring the DFS remains computationally tractable. Experiments were conducted using a multi-vehicle simulator with variable fidelity levels defined by laser sensor resolution (e.g., 128 rays for high fidelity vs. 0–32 rays for low fidelity). The authors measured the performance gap and data cost, establishing a cost ratio of 36:1 between the high- and low-fidelity simulators. Results demonstrated that MFS-MARL effectively obtains local feasible policies and reduces the total data cost by 23% compared to vanilla Soft Actor-Critic (SAC). Furthermore, the method achieved convergence speeds comparable to well-trained expert policy methods. The study confirms that while training solely on low-fidelity data harms final performance, using it strategically to resolve local puzzles significantly lowers the overall training overhead without compromising the final policy's effectiveness in high-fidelity environments.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

StageOutcomeToolModelPromptAttemptsCompleted
discover success DOAJ 1 2026-06-25
archive success unpaywall 1 2026-06-26
extract success cached 2 2026-06-26
clean success clean 1 2026-06-25
chunk success chunk 1 2026-06-25
embed success embed Qwen/Qwen3-Embedding-8B 1 2026-06-25
promote success 1 2026-06-25
summarize success llm qwen3.6-27b-prismaquant summ-v5 1 2026-06-26
tag success vector_similarity 6 2026-06-25
verify success 1 2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.