pith. sign in

arxiv: 2607.01281 · v1 · pith:56BOS2HRnew · submitted 2026-07-01 · 💻 cs.RO

WaveLander: A Generalizable Hierarchical Control Framework for UAV Landing on Wave-Disturbed Platforms via Reinforcement Learning

Pith reviewed 2026-07-03 20:38 UTC · model grok-4.3

classification 💻 cs.RO
keywords UAV landingreinforcement learninghierarchical controlwave-disturbed platformsmarine UAV recoveryautonomous landingRL policy
0
0 comments X

The pith

A hierarchical RL policy outputs only vertical velocity for UAV landing on wave-moving platforms while a conventional controller handles the rest.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces WaveLander, a hierarchical framework that splits UAV landing on wave-disturbed marine platforms into two layers. An RL agent decides the timing and rate of descent by outputting a single vertical velocity command from compact observations of the platform's motion. A standard low-level flight controller then keeps the vehicle stable in attitude and position. This turns a high-dimensional, coupled control task into a simpler timing decision. Simulations with randomized wave motions show the approach lands reliably and works on disturbance patterns not encountered during training.

Core claim

WaveLander is a hierarchical control framework via reinforcement learning that decouples vertical landing decision-making from low-level flight stabilization. The RL policy maps a compact platform-relative observation to a scalar vertical velocity reference, while a conventional low-level flight controller maintains attitude stability and lateral tracking. This formulation reduces dynamic platform landing to a low-dimensional, timing-aware control problem and enables smooth landing behavior without explicit switching rules. Simulation results under randomized wave-induced platform motions show that WaveLander achieves robust landing performance and generalizes to unseen disturbance condition

What carries the argument

The RL policy that maps platform-relative observations to a scalar vertical velocity reference, combined with a conventional low-level flight controller for attitude and lateral stability.

If this is right

  • The system achieves robust landing performance under randomized wave-induced platform motions.
  • Performance generalizes to disturbance conditions not seen during training.
  • The approach reduces the landing task to a low-dimensional timing-aware control problem.
  • Smooth landing occurs without explicit switching rules between flight modes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same split could be tested on real UAV hardware over actual sea conditions to check transfer from simulation.
  • The compact observation space might allow the RL layer to be retrained quickly for new platform types such as moving ship decks.
  • Because the low-level controller stays conventional, safety monitors could be added around only the vertical command without redesigning the full attitude loop.

Load-bearing premise

A conventional low-level flight controller can reliably maintain attitude stability and lateral tracking while the RL policy supplies only a scalar vertical velocity reference.

What would settle it

A test run in which the low-level controller loses lateral tracking or attitude stability under the same wave conditions used in the reported simulations, producing failed landings despite the RL policy remaining active.

Figures

Figures reproduced from arXiv: 2607.01281 by Chun-Kit Li, Hin Wang Lin, Iok Long Sit, Ka Yu Kui, Ling Shi, Ming Fung Siu, Pengyu Wang.

Figure 1
Figure 1. Figure 1: Overview visualization of WaveLander. (a) MuJoCo simulation rendering of timing-aware landing on a wave-disturbed platform. (b) Real-world deployment illustration showing representative landing and retreat behaviors. Abstract— Autonomous landing of unmanned aerial vehicles (UAVs) on wave-disturbed marine platforms remains challeng￾ing due to stochastic platform motion, time-varying platform attitude, and u… view at source ↗
Figure 2
Figure 2. Figure 2: System overview of the proposed framework. Timing-related landing behavior is encouraged through re￾ward shaping rather than explicit platform-motion prediction or threshold-based switching. A. System Overview As shown in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Policy behavior under wave disturbances. Shaded regions denote safe touchdown intervals with θtilt,t ≤ θok. a short-term memory of past observations. Nevertheless, the observed timing behavior is learned from the reward design rather than from an explicit predictive model of future platform motion. B. Experimental Setup The complete framework is evaluated in three settings: • MuJoCo simulation: quantitativ… view at source ↗
Figure 6
Figure 6. Figure 6: Real-world deployment setup under moderate wind and wave conditions [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Cumulative distribution of touchdown attitude mismatch θtd in MuJoCo. WaveLander shifts the distribution toward lower mismatch values, indicating safer touchdown timing under wave-induced platform motion. D. Transfer to Isaac Sim with SITL To evaluate deployment compatibility, the trained policy is transferred to NVIDIA Isaac Sim using Pegasus [19] and integrated with an ArduPilot-based flight controller i… view at source ↗
read the original abstract

Autonomous landing of unmanned aerial vehicles (UAVs) on wave-disturbed marine platforms remains challenging due to stochastic platform motion, time-varying platform attitude, and uncertain touchdown conditions. Existing model-based methods often require accurate motion prediction and online optimization, while end-to-end learning approaches may suffer from high training complexity and limited interpretability. This paper presents WaveLander, a hierarchical control framework via reinforcement learning (RL) that decouples vertical landing decision-making from low-level flight stabilization. The RL policy maps a compact platform-relative observation to a scalar vertical velocity reference, while a conventional low-level flight controller maintains attitude stability and lateral tracking. This formulation reduces dynamic platform landing to a low-dimensional, timing-aware control problem and enables smooth landing behavior without explicit switching rules. Simulation results under randomized wave-induced platform motions show that WaveLander achieves robust landing performance and generalizes to unseen disturbance conditions, demonstrating the potential of hierarchical learning-based control for marine UAV recovery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper presents WaveLander, a hierarchical RL framework for UAV landing on wave-disturbed marine platforms. It decouples the problem by having an RL policy map compact platform-relative observations to a scalar vertical velocity reference, while a conventional low-level controller handles attitude stability and lateral tracking. Simulation results under randomized wave-induced motions are claimed to demonstrate robust landing performance and generalization to unseen disturbances.

Significance. If the simulation evidence holds after addressing the identified gaps, the work would illustrate a practical way to reduce high-dimensional marine landing to a low-dimensional timing-aware problem, offering improved interpretability over end-to-end RL while avoiding the need for explicit switching rules or full online optimization.

major comments (2)
  1. [Abstract] Abstract: the central claim of 'robust landing performance and generalization to unseen disturbance conditions' rests on aggregate success rates whose supporting evidence (methods, data, error metrics, per-trial lateral tracking error, attitude deviation) is absent, preventing evaluation of whether the low-level controller actually maintains stability under the stated conditions.
  2. [Abstract] Hierarchical formulation (described in Abstract): the assumption that a standard low-level tracker (PID or geometric) can reliably reject wave-induced lateral forces and attitude disturbances while tracking only a scalar vertical velocity reference is load-bearing for the robustness claim, yet no ablation isolating the low-level controller, no lateral tracking error statistics, and no full 6DOF coupling analysis under randomized wave spectra are provided.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and the hierarchical formulation. We address the major comments point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of 'robust landing performance and generalization to unseen disturbance conditions' rests on aggregate success rates whose supporting evidence (methods, data, error metrics, per-trial lateral tracking error, attitude deviation) is absent, preventing evaluation of whether the low-level controller actually maintains stability under the stated conditions.

    Authors: The abstract summarizes the key results; the full manuscript describes the simulation methods, randomized wave spectra, and aggregate success rates in the Experiments section. We acknowledge that specific supporting metrics for low-level controller stability (lateral tracking error and attitude deviation) are not presented at the level of detail requested. In the revision we will add these per-trial statistics and a short discussion of how they confirm stability under the tested conditions. revision: yes

  2. Referee: [Abstract] Hierarchical formulation (described in Abstract): the assumption that a standard low-level tracker (PID or geometric) can reliably reject wave-induced lateral forces and attitude disturbances while tracking only a scalar vertical velocity reference is load-bearing for the robustness claim, yet no ablation isolating the low-level controller, no lateral tracking error statistics, and no full 6DOF coupling analysis under randomized wave spectra are provided.

    Authors: The design intentionally delegates lateral and attitude stabilization to a conventional controller so that the RL policy can operate on a low-dimensional vertical reference. We agree that explicit evidence for this separation would strengthen the robustness argument. The revised manuscript will include an ablation that isolates the low-level controller, lateral tracking error statistics, and a 6DOF coupling analysis under the randomized wave conditions already used in our simulations. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on independent simulation outcomes

full rationale

The paper defines a hierarchical controller (RL policy outputs scalar vertical velocity reference; conventional low-level controller handles attitude/lateral tracking) and validates it via simulation under randomized wave motions, reporting aggregate landing success and generalization. No derivation chain reduces by construction to its inputs, no fitted parameters are relabeled as predictions, and no load-bearing self-citations or uniqueness theorems appear. The empirical results are external to the method definition and do not rely on self-referential fitting or renaming.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.1-grok · 5720 in / 1028 out tokens · 23400 ms · 2026-07-03T20:38:39.919161+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 2 internal anchors

  1. [1]

    Coastal underwater evidence search system with surface-underwater collaboration,

    H. W. Lin, P. Wang, Z. Yang, K. C. Leung, F. Bao, K. Y . Kui, J. X. E. Xu, and L. Shi, “Coastal underwater evidence search system with surface-underwater collaboration,” inIEEE International Conference on Control, Automation, Robotics and Vision, pp. 1047–1053, 2024

  2. [2]

    Aerial-marine cross-domain uncrewed systems: An overview of cyberphysical coordination frameworks for marine applications,

    H.-T. Zhang, B.-B. Hu, B. Liu, J. Ding, J. Zhao, H. Su, Y . Zhang, C. Zhu, Y . Yuan, and Y . Shi, “Aerial-marine cross-domain uncrewed systems: An overview of cyberphysical coordination frameworks for marine applications,”IEEE Control Systems, vol. 45, no. 4, pp. 28–45, 2025

  3. [3]

    A separation and ren- dezvous control method for the UA V-USV system based on distributed NMPC,

    S. Li, Y . Zhu, G. Guo, P. Yuan, and J. Bai, “A separation and ren- dezvous control method for the UA V-USV system based on distributed NMPC,”IEEE Transactions on Intelligent V ehicles, vol. 9, no. 11, pp. 7251–7263, 2024

  4. [4]

    Quadrotor au- tonomous landing on moving platform,

    P. Wang, C. Wang, J. Wang, and M. Q.-H. Meng, “Quadrotor au- tonomous landing on moving platform,”Procedia Computer Science, vol. 209, pp. 40–49, 2022

  5. [5]

    Landing a UA V in harsh winds and turbulent open waters,

    P. M. Gupta, ´E. Pairet, T. Nascimento, and M. Saska, “Landing a UA V in harsh winds and turbulent open waters,”IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 744–751, 2023

  6. [6]

    Toward end-to-end control for UA V autonomous landing via deep reinforcement learning,

    R. Polvara, M. Patacchiola, S. Sharma, J. Wan, A. Manning, R. Sutton, and A. Cangelosi, “Toward end-to-end control for UA V autonomous landing via deep reinforcement learning,” in2018 International Con- ference on Unmanned Aircraft Systems, pp. 115–123, 2018

  7. [7]

    Synchronized motion-based UA V- USV cooperative autonomous landing,

    W. Li, Y . Ge, Z. Guan, and G. Ye, “Synchronized motion-based UA V- USV cooperative autonomous landing,”Journal of Marine Science and Engineering, vol. 10, no. 9, p. 1214, 2022

  8. [8]

    NMPC-based UA V-USV cooperative tracking and landing,

    W. Li, Y . Ge, Z. Guan, H. Gao, and H. Feng, “NMPC-based UA V-USV cooperative tracking and landing,”Journal of the Franklin Institute, vol. 360, no. 11, pp. 7481–7500, 2023

  9. [9]

    A manipulator-assisted multiple UA V landing system for USV subject to disturbance,

    R. Xu, C. Liu, Z. Cao, Y . Wang, and H. Qian, “A manipulator-assisted multiple UA V landing system for USV subject to disturbance,”Ocean Engineering, vol. 299, p. 117306, 2024

  10. [10]

    A deep reinforcement learning strategy for UA V au- tonomous landing on a moving platform,

    A. Rodriguez-Ramos, C. Sampedro, H. Bavle, P. de la Puente, and P. Campoy, “A deep reinforcement learning strategy for UA V au- tonomous landing on a moving platform,”Journal of Intelligent & Robotic Systems, vol. 93, pp. 351–366, 2019

  11. [11]

    Reinforcement learning based au- tonomous multi-rotor landing on moving platforms,

    P. Goldschmid and A. Ahmad, “Reinforcement learning based au- tonomous multi-rotor landing on moving platforms,”Autonomous Robots, vol. 48, no. 4, p. 13, 2024

  12. [12]

    Robust reinforcement learning algorithm for vision-based ship landing of UA Vs,

    V . Saj, B. Lee, D. Kalathil, and M. Benedict, “Robust reinforcement learning algorithm for vision-based ship landing of UA Vs,”arXiv preprint arXiv:2209.08381, 2022

  13. [13]

    Lander.ai: Adaptive landing behavior agent for expertise in 3d dy- namic platform landings,

    R. Peter, L. Ratnabala, D. Aschu, A. Fedoseev, and D. Tsetserukou, “Lander.ai: Adaptive landing behavior agent for expertise in 3d dy- namic platform landings,”arXiv preprint arXiv:2403.06572, 2024

  14. [14]

    R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduc- tion. Cambridge, MA, USA: MIT Press, 2 ed., 2018

  15. [15]

    On the theory of the brownian motion,

    G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the brownian motion,”Physical Review, vol. 36, no. 5, pp. 823–841, 1930

  16. [16]

    Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

    M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano- Mu˜noz, X. Yao, R. Zurbr ¨ugg, N. Rudin,et al., “Isaac Lab: A gpu- accelerated simulation framework for multi-modal robot learning,” arXiv preprint arXiv:2511.04831, 2025

  17. [17]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

  18. [18]

    Rsl-rl: A learning library for robotics research,

    C. Schwarke, M. Mittal, N. Rudin, D. Hoeller, and M. Hutter, “RSL-RL: A learning library for robotics research,”arXiv preprint arXiv:2509.10771, 2025

  19. [19]

    Pegasus simulator: An isaac sim framework for multiple aerial vehicles simulation,

    M. Jacinto, J. Pinto, J. Patrikar, J. Keller, R. Cunha, S. Scherer, and A. Pascoal, “Pegasus simulator: An isaac sim framework for multiple aerial vehicles simulation,” in2024 International Conference on Unmanned Aircraft Systems, pp. 917–922, 2024

  20. [20]

    Robot Operating System 2: Design, architecture, and uses in the wild,

    S. Macenskiet al., “Robot Operating System 2: Design, architecture, and uses in the wild,”Science Robotics, vol. 7, no. 66, p. eabm6074, 2022

  21. [21]

    MuJoCo: A physics engine for model-based control,

    E. Todorov, T. Erez, and Y . Tassa, “MuJoCo: A physics engine for model-based control,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033, 2012