Master Thesis: Decentralized Real-Time Planning for Multi-UAV Cooperative Manipulation via Imitation Learning

Collaborative transportation and manipulation of cable-suspended loads by multiple UAVs offer a promising way for expanding UAVs’ role in heavy-lifting operations. Existing approaches for collaborative aerial manipulation of a payload along a reference trajectory typically rely either on centralized control architectures or on reliable inter-agent communication. In this work, we propose a novel machine learning–based method for decentralized kinodynamic planning that operates effectively under partial observability and without inter-agent communication.

Real-World Experiments

We validate our approach through a one-shot sim-to-real transfer of the trained policy, demonstrating its effectiveness in real-world scenarios. Video demonstrations are available below:

Decentralized ML student policy:

NMPC Teacher Policy:

These results have been submitted for peer review to the upcoming IEEE MRS conference.

Approach

Our method leverages imitation learning to train a decentralized homogenous student police by imitating a centralized kinodynamic motion planner. This centralized teacher policy has access to privileged global observations and is an online NMPC planner built using the Acados framework.

Using the DAgger algorithm, we distil the teacher policy into a deep-learning based online kinodynamic planner. The resulting student policy is decentralized and strongly homogenous, i.e., every UAV independently runs a copy of this policy to accomplish the same task. Our use of homogenous agents with shared parameters, ensures scalability and excellent sample efficiency.

Generalizability

A single deep learning policy is trained to follow all 6 trajectories shown below. For trajectories on the left, the desired load orientation is such that there is 0 side-slip angle. On the right side, desired load orientation is constant.

This same policy can also transport the payload along the Zandvoort F1 track. This F1 trajectory was not a part of the training set. A thorough analysis of the performance of our method is provided here: https://repository.tudelft.nl/record/uuid:39d40ea4-a4c3-4c8d-8ab4-81730dca3be3