Learning-guided Prioritized Planning for Lifelong Multi-Agent Path Finding in Warehouse Automation

Zheng, Han; Ma, Yining; Araki, Brandon; Chen, Jingkai; Wu, Cathy · 2026 · Crossref

DOI: 10.1613/jair.1.20611

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of Lifelong Multi-Agent Path Finding (MAPF) in warehouse automation, where autonomous mobile robots must continuously navigate conflict-free paths to maximize system throughput. While classical search-based solvers like Conflict-Based Search are effective for static scenarios, they struggle with the scalability and long-term dynamics of lifelong settings. Existing machine learning approaches have failed to consistently outperform these search-based methods. To bridge this gap, the authors introduce Reinforcement Learning-guided Rolling Horizon Prioritized Planning (RL-RH-PP), the first framework to integrate reinforcement learning with search-based planning for lifelong MAPF. The proposed method leverages Prioritized Planning (PP) as a computationally efficient backbone, decomposing the multi-agent problem into sequential single-agent plans based on a priority order. The authors extend PP into a rolling-horizon framework (RH-PP) that replans paths within discrete time windows as new tasks arrive. Dynamic priority assignment is formulated as a Partially Observable Markov Decision Process (POMDP). An attention-based transformer neural network serves as the RL policy, capturing both spatial and temporal dependencies among agents to autoregressively decode priority orders on-the-fly. This allows the system to sample promising priority orders, which are then executed by the PP planner, with a conflict-repair mechanism ensuring collision-free paths within the planning horizon. Experiments were conducted in realistic warehouse simulations, including novel benchmarks based on Symbotic warehouse layouts characterized by high obstacle density and bottlenecks. RL-RH-PP achieved the highest total throughput among all baselines, demonstrating an average 25% improvement over RH-PP with random priority orders. The framework exhibited strong zero-shot generalization across varying agent densities, planning horizons, and unseen map layouts. Interpretive analyses revealed that the learned policy proactively assigns higher priorities to agents in congested regions and strategically redirects others away from bottlenecks, thereby easing traffic flow and recovering from potential deadlocks. The significance of this work lies in demonstrating that learning-guided approaches can effectively augment traditional heuristics in complex, dynamic environments. By focusing the learning component on optimizing global priority orders while relying on the efficiency of prioritized planning for path computation, RL-RH-PP achieves a balance of scalability and solution quality. The findings suggest that integrating deep reinforcement learning with lightweight search-based planners is a viable strategy for improving coordination in large-scale warehouse automation and other multi-robot systems.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	Crossref	—	—	1	2026-06-18
archive	success	canonical_url	—	—	1	2026-06-25
extract	success	cached	—	—	2	2026-06-26
clean	success	clean	—	—	1	2026-06-19
chunk	success	chunk	—	—	1	2026-06-19
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-19
promote	success	—	—	—	1	2026-06-18
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-26
tag	success	vector_similarity	—	—	6	2026-06-19
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

last mile delivery