Reinforcement learning for shared autonomy drone landings

Backman, Kal; Kulić, Dana; Chung, Hoam · 2023 · Crossref

DOI: 10.1007/s10514-023-10143-3

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the difficulty novice pilots face when landing unmanned aerial vehicles (UAVs), particularly in environments with poor depth perception and limited safe landing zones. To mitigate these challenges, the authors propose a shared autonomy system that assists pilots by augmenting their control inputs. The system is designed to operate without prior knowledge of the environment or the pilot’s specific goal, relying instead on inferring intent from pilot actions. The approach consists of two main modules: a perception module that encodes visual data into a compressed latent representation, and a policy module trained via reinforcement learning to provide corrective control inputs. The perception module utilizes a Cross-Modal Variational Auto-Encoder (CM-VAE) to process inputs from two downward-facing RGB-D cameras. It employs a novel camera projection model to combine stereo images into a single representation, encoding scene structure and safe landing areas into a low-dimensional latent vector. This design maximizes the field of view along the depth axis and improves robustness against sensor noise through dynamic noise generation during training. The policy module uses the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. It is trained in simulation using a population of simulated users characterized by four parameters: conformance to the assistant, proficiency, aggressiveness, and speed. These parameters model how pilots operate joystick controllers and react to disturbances. A key methodological innovation is reformulating the critic’s state transition process as a fully observable Markov Decision Process (MDP) by providing the simulated user’s hidden intent as privileged information during training, which significantly reduces convergence time. The system was validated through a user study with 28 human participants tasked with landing a physical UAV on various platforms under challenging viewing conditions. The assistant, trained exclusively on simulated data, improved the task success rate from 51.4% to 98.2%. The study demonstrated that the system could infer landing goals without a priori knowledge of the environment structure. Furthermore, participants using the assistant achieved proficiency levels greater than those of the most experienced unassisted participants, regardless of their prior piloting experience. The inclusion of an LSTM cell in the policy network allowed for better handling of temporal information compared to previous approaches. The significance of this work lies in its rigorous validation in a physical environment with naive users, distinguishing it from prior studies limited to simulation or small sample sizes. The proposed shared autonomy strategy effectively reduces the training costs and accessibility barriers for UAV piloting. By successfully transferring a model trained in simulation to reality and handling multiple ambiguous goals, the approach demonstrates robust performance in unseen conditions. This contributes to the field by providing a scalable solution for assistive robotics that enhances safety and performance for non-expert operators in complex, unstructured environments.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

StageOutcomeToolModelPromptAttemptsCompleted
discover success Crossref 1 2026-06-25
archive success canonical_url 1 2026-06-26
extract success cached 2 2026-06-26
clean success clean 1 2026-06-25
chunk success chunk 1 2026-06-25
embed success embed Qwen/Qwen3-Embedding-8B 1 2026-06-25
promote success 1 2026-06-25
summarize success llm qwen3.6-27b-prismaquant summ-v5 1 2026-06-26
tag success vector_similarity 6 2026-06-25
verify success 1 2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.