WaveLander: A Generalizable Hierarchical Control Framework for UAV Landing on Wave-Disturbed Platforms via Reinforcement Learning
Pith reviewed 2026-07-03 20:38 UTC · model grok-4.3
The pith
A hierarchical RL policy outputs only vertical velocity for UAV landing on wave-moving platforms while a conventional controller handles the rest.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
WaveLander is a hierarchical control framework via reinforcement learning that decouples vertical landing decision-making from low-level flight stabilization. The RL policy maps a compact platform-relative observation to a scalar vertical velocity reference, while a conventional low-level flight controller maintains attitude stability and lateral tracking. This formulation reduces dynamic platform landing to a low-dimensional, timing-aware control problem and enables smooth landing behavior without explicit switching rules. Simulation results under randomized wave-induced platform motions show that WaveLander achieves robust landing performance and generalizes to unseen disturbance condition
What carries the argument
The RL policy that maps platform-relative observations to a scalar vertical velocity reference, combined with a conventional low-level flight controller for attitude and lateral stability.
If this is right
- The system achieves robust landing performance under randomized wave-induced platform motions.
- Performance generalizes to disturbance conditions not seen during training.
- The approach reduces the landing task to a low-dimensional timing-aware control problem.
- Smooth landing occurs without explicit switching rules between flight modes.
Where Pith is reading between the lines
- The same split could be tested on real UAV hardware over actual sea conditions to check transfer from simulation.
- The compact observation space might allow the RL layer to be retrained quickly for new platform types such as moving ship decks.
- Because the low-level controller stays conventional, safety monitors could be added around only the vertical command without redesigning the full attitude loop.
Load-bearing premise
A conventional low-level flight controller can reliably maintain attitude stability and lateral tracking while the RL policy supplies only a scalar vertical velocity reference.
What would settle it
A test run in which the low-level controller loses lateral tracking or attitude stability under the same wave conditions used in the reported simulations, producing failed landings despite the RL policy remaining active.
Figures
read the original abstract
Autonomous landing of unmanned aerial vehicles (UAVs) on wave-disturbed marine platforms remains challenging due to stochastic platform motion, time-varying platform attitude, and uncertain touchdown conditions. Existing model-based methods often require accurate motion prediction and online optimization, while end-to-end learning approaches may suffer from high training complexity and limited interpretability. This paper presents WaveLander, a hierarchical control framework via reinforcement learning (RL) that decouples vertical landing decision-making from low-level flight stabilization. The RL policy maps a compact platform-relative observation to a scalar vertical velocity reference, while a conventional low-level flight controller maintains attitude stability and lateral tracking. This formulation reduces dynamic platform landing to a low-dimensional, timing-aware control problem and enables smooth landing behavior without explicit switching rules. Simulation results under randomized wave-induced platform motions show that WaveLander achieves robust landing performance and generalizes to unseen disturbance conditions, demonstrating the potential of hierarchical learning-based control for marine UAV recovery.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents WaveLander, a hierarchical RL framework for UAV landing on wave-disturbed marine platforms. It decouples the problem by having an RL policy map compact platform-relative observations to a scalar vertical velocity reference, while a conventional low-level controller handles attitude stability and lateral tracking. Simulation results under randomized wave-induced motions are claimed to demonstrate robust landing performance and generalization to unseen disturbances.
Significance. If the simulation evidence holds after addressing the identified gaps, the work would illustrate a practical way to reduce high-dimensional marine landing to a low-dimensional timing-aware problem, offering improved interpretability over end-to-end RL while avoiding the need for explicit switching rules or full online optimization.
major comments (2)
- [Abstract] Abstract: the central claim of 'robust landing performance and generalization to unseen disturbance conditions' rests on aggregate success rates whose supporting evidence (methods, data, error metrics, per-trial lateral tracking error, attitude deviation) is absent, preventing evaluation of whether the low-level controller actually maintains stability under the stated conditions.
- [Abstract] Hierarchical formulation (described in Abstract): the assumption that a standard low-level tracker (PID or geometric) can reliably reject wave-induced lateral forces and attitude disturbances while tracking only a scalar vertical velocity reference is load-bearing for the robustness claim, yet no ablation isolating the low-level controller, no lateral tracking error statistics, and no full 6DOF coupling analysis under randomized wave spectra are provided.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract and the hierarchical formulation. We address the major comments point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of 'robust landing performance and generalization to unseen disturbance conditions' rests on aggregate success rates whose supporting evidence (methods, data, error metrics, per-trial lateral tracking error, attitude deviation) is absent, preventing evaluation of whether the low-level controller actually maintains stability under the stated conditions.
Authors: The abstract summarizes the key results; the full manuscript describes the simulation methods, randomized wave spectra, and aggregate success rates in the Experiments section. We acknowledge that specific supporting metrics for low-level controller stability (lateral tracking error and attitude deviation) are not presented at the level of detail requested. In the revision we will add these per-trial statistics and a short discussion of how they confirm stability under the tested conditions. revision: yes
-
Referee: [Abstract] Hierarchical formulation (described in Abstract): the assumption that a standard low-level tracker (PID or geometric) can reliably reject wave-induced lateral forces and attitude disturbances while tracking only a scalar vertical velocity reference is load-bearing for the robustness claim, yet no ablation isolating the low-level controller, no lateral tracking error statistics, and no full 6DOF coupling analysis under randomized wave spectra are provided.
Authors: The design intentionally delegates lateral and attitude stabilization to a conventional controller so that the RL policy can operate on a low-dimensional vertical reference. We agree that explicit evidence for this separation would strengthen the robustness argument. The revised manuscript will include an ablation that isolates the low-level controller, lateral tracking error statistics, and a 6DOF coupling analysis under the randomized wave conditions already used in our simulations. revision: yes
Circularity Check
No circularity; claims rest on independent simulation outcomes
full rationale
The paper defines a hierarchical controller (RL policy outputs scalar vertical velocity reference; conventional low-level controller handles attitude/lateral tracking) and validates it via simulation under randomized wave motions, reporting aggregate landing success and generalization. No derivation chain reduces by construction to its inputs, no fitted parameters are relabeled as predictions, and no load-bearing self-citations or uniqueness theorems appear. The empirical results are external to the method definition and do not rely on self-referential fitting or renaming.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Coastal underwater evidence search system with surface-underwater collaboration,
H. W. Lin, P. Wang, Z. Yang, K. C. Leung, F. Bao, K. Y . Kui, J. X. E. Xu, and L. Shi, “Coastal underwater evidence search system with surface-underwater collaboration,” inIEEE International Conference on Control, Automation, Robotics and Vision, pp. 1047–1053, 2024
work page 2024
-
[2]
H.-T. Zhang, B.-B. Hu, B. Liu, J. Ding, J. Zhao, H. Su, Y . Zhang, C. Zhu, Y . Yuan, and Y . Shi, “Aerial-marine cross-domain uncrewed systems: An overview of cyberphysical coordination frameworks for marine applications,”IEEE Control Systems, vol. 45, no. 4, pp. 28–45, 2025
work page 2025
-
[3]
A separation and ren- dezvous control method for the UA V-USV system based on distributed NMPC,
S. Li, Y . Zhu, G. Guo, P. Yuan, and J. Bai, “A separation and ren- dezvous control method for the UA V-USV system based on distributed NMPC,”IEEE Transactions on Intelligent V ehicles, vol. 9, no. 11, pp. 7251–7263, 2024
work page 2024
-
[4]
Quadrotor au- tonomous landing on moving platform,
P. Wang, C. Wang, J. Wang, and M. Q.-H. Meng, “Quadrotor au- tonomous landing on moving platform,”Procedia Computer Science, vol. 209, pp. 40–49, 2022
work page 2022
-
[5]
Landing a UA V in harsh winds and turbulent open waters,
P. M. Gupta, ´E. Pairet, T. Nascimento, and M. Saska, “Landing a UA V in harsh winds and turbulent open waters,”IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 744–751, 2023
work page 2023
-
[6]
Toward end-to-end control for UA V autonomous landing via deep reinforcement learning,
R. Polvara, M. Patacchiola, S. Sharma, J. Wan, A. Manning, R. Sutton, and A. Cangelosi, “Toward end-to-end control for UA V autonomous landing via deep reinforcement learning,” in2018 International Con- ference on Unmanned Aircraft Systems, pp. 115–123, 2018
work page 2018
-
[7]
Synchronized motion-based UA V- USV cooperative autonomous landing,
W. Li, Y . Ge, Z. Guan, and G. Ye, “Synchronized motion-based UA V- USV cooperative autonomous landing,”Journal of Marine Science and Engineering, vol. 10, no. 9, p. 1214, 2022
work page 2022
-
[8]
NMPC-based UA V-USV cooperative tracking and landing,
W. Li, Y . Ge, Z. Guan, H. Gao, and H. Feng, “NMPC-based UA V-USV cooperative tracking and landing,”Journal of the Franklin Institute, vol. 360, no. 11, pp. 7481–7500, 2023
work page 2023
-
[9]
A manipulator-assisted multiple UA V landing system for USV subject to disturbance,
R. Xu, C. Liu, Z. Cao, Y . Wang, and H. Qian, “A manipulator-assisted multiple UA V landing system for USV subject to disturbance,”Ocean Engineering, vol. 299, p. 117306, 2024
work page 2024
-
[10]
A deep reinforcement learning strategy for UA V au- tonomous landing on a moving platform,
A. Rodriguez-Ramos, C. Sampedro, H. Bavle, P. de la Puente, and P. Campoy, “A deep reinforcement learning strategy for UA V au- tonomous landing on a moving platform,”Journal of Intelligent & Robotic Systems, vol. 93, pp. 351–366, 2019
work page 2019
-
[11]
Reinforcement learning based au- tonomous multi-rotor landing on moving platforms,
P. Goldschmid and A. Ahmad, “Reinforcement learning based au- tonomous multi-rotor landing on moving platforms,”Autonomous Robots, vol. 48, no. 4, p. 13, 2024
work page 2024
-
[12]
Robust reinforcement learning algorithm for vision-based ship landing of UA Vs,
V . Saj, B. Lee, D. Kalathil, and M. Benedict, “Robust reinforcement learning algorithm for vision-based ship landing of UA Vs,”arXiv preprint arXiv:2209.08381, 2022
-
[13]
Lander.ai: Adaptive landing behavior agent for expertise in 3d dy- namic platform landings,
R. Peter, L. Ratnabala, D. Aschu, A. Fedoseev, and D. Tsetserukou, “Lander.ai: Adaptive landing behavior agent for expertise in 3d dy- namic platform landings,”arXiv preprint arXiv:2403.06572, 2024
-
[14]
R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduc- tion. Cambridge, MA, USA: MIT Press, 2 ed., 2018
work page 2018
-
[15]
On the theory of the brownian motion,
G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the brownian motion,”Physical Review, vol. 36, no. 5, pp. 823–841, 1930
work page 1930
-
[16]
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano- Mu˜noz, X. Yao, R. Zurbr ¨ugg, N. Rudin,et al., “Isaac Lab: A gpu- accelerated simulation framework for multi-modal robot learning,” arXiv preprint arXiv:2511.04831, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[18]
Rsl-rl: A learning library for robotics research,
C. Schwarke, M. Mittal, N. Rudin, D. Hoeller, and M. Hutter, “RSL-RL: A learning library for robotics research,”arXiv preprint arXiv:2509.10771, 2025
-
[19]
Pegasus simulator: An isaac sim framework for multiple aerial vehicles simulation,
M. Jacinto, J. Pinto, J. Patrikar, J. Keller, R. Cunha, S. Scherer, and A. Pascoal, “Pegasus simulator: An isaac sim framework for multiple aerial vehicles simulation,” in2024 International Conference on Unmanned Aircraft Systems, pp. 917–922, 2024
work page 2024
-
[20]
Robot Operating System 2: Design, architecture, and uses in the wild,
S. Macenskiet al., “Robot Operating System 2: Design, architecture, and uses in the wild,”Science Robotics, vol. 7, no. 66, p. eabm6074, 2022
work page 2022
-
[21]
MuJoCo: A physics engine for model-based control,
E. Todorov, T. Erez, and Y . Tassa, “MuJoCo: A physics engine for model-based control,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033, 2012
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.