pith. sign in

arxiv: 2607.02037 · v1 · pith:BJDIZ7H2new · submitted 2026-07-02 · 💻 cs.RO · cs.LG

Cross-Platform Control for Autonomous Surface Vehicles via Adaptive Reinforcement Learning

Pith reviewed 2026-07-03 11:51 UTC · model grok-4.3

classification 💻 cs.RO cs.LG
keywords reinforcement learningautonomous surface vehiclestrajectory trackingcross-platform generalizationzero-shot deploymentadaptive controlteacher-student architecture
0
0 comments X

The pith

A single reinforcement learning policy trained in simulation generalizes to multiple real-world autonomous surface vehicles without fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to establish that an adaptive reinforcement learning controller, trained once under randomized vessel dynamics in a basic simulation, can track trajectories on different physical boats without any retraining or platform-specific adjustments. It handles unknown dynamics by conditioning the policy on interaction history through a teacher-student module that infers a latent representation of each platform. This approach matters because most controllers require separate design and tuning for each vehicle due to differences in shape, mass, and water interaction. If the claim holds, one learned policy could serve varied platforms directly from simulation. Real-world tests on two platforms show the adaptive policy reduces position error by up to 58 percent compared with non-adaptive learning baselines and nearly matches a tuned controller built for one platform.

Core claim

The authors show that a policy trained in simulation with randomized vessel dynamics, using a teacher-student architecture to infer latent platform dynamics from interaction history, transfers zero-shot to two distinct real autonomous surface vehicles. The policy achieves position mean absolute error up to 58 percent lower than non-adaptive learning baselines while approaching the accuracy of a platform-specific tuned controller, even though the training model is a simple analytical dynamics approximation rather than a high-fidelity hydrodynamic simulator.

What carries the argument

The teacher-student architecture that infers a latent representation of unknown platform dynamics from interaction history and conditions the policy on that representation.

If this is right

  • A single policy can be deployed across multiple autonomous surface vehicle platforms without per-platform retraining or tuning.
  • Training can use a basic analytical dynamics model instead of detailed hydrodynamic simulators while still achieving real-world generalization.
  • Adaptive policies conditioned on history can close much of the performance gap to hand-tuned platform-specific controllers.
  • Cross-platform transfer succeeds despite differences in actuation and hydrodynamic characteristics between the test vehicles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same history-conditioning technique could support zero-shot transfer on other vehicle types whose dynamics vary across instances, such as different drone frames or ground robots.
  • Widening the randomization ranges or adding more platform parameters in simulation might further reduce the remaining gap to tuned controllers.
  • The results imply that partial-observability handling via latent inference is more critical for sim-to-real success than using high-fidelity physics models.

Load-bearing premise

Randomizing vessel dynamics inside a simple analytical simulation model produces a policy that generalizes to the complex unmodeled hydrodynamic effects on real unseen platforms.

What would settle it

Deploy the same policy on a third autonomous surface vehicle whose dynamics lie outside the randomization ranges used in training and measure whether position tracking error remains comparable to the tuned controller.

Figures

Figures reproduced from arXiv: 2607.02037 by Aswin Ramachandran, Raffaello D'Andrea, Ruiheng Jiang, Thomas Bi.

Figure 1
Figure 1. Figure 1: Real-world experiment on Platform A, one of two ASV platforms [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Coordinate frames and velocity components for the 3-DoF [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Teacher–student architecture. In Phase 1, the encoder maps the privileged dynamics vector [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Scree plot of the learned latent z over the dynamics. The first two principal components capture 91% of the variance, indicating that the nine￾dimensional latent occupies a low-dimensional subspace and that matching dz to de is not a binding constraint. of yaw control authorities across the simulated platforms, where adaptation has a stronger effect. The recurrent policy achieves 0.96 cm and 2.34◦ , improv… view at source ↗
Figure 6
Figure 6. Figure 6: Online adaptation of the student’s inferred latent [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Teacher latent component z0 as a function of the uniform dynamics￾scaling factor s, for the controlled one-dimensional sweep variant. ACKNOWLEDGMENTS The authors thank Sebastian Burmester, Jan Kamm, and Noa Sendlhofer for their help with experimental testing and for valuable discussions. REFERENCES [1] J. E. Manley, “Unmanned surface vehicles, 15 years of development,” in OCEANS 2008, 2008, pp. 1–4. [2] R.… view at source ↗
read the original abstract

Autonomous surface vehicles vary widely in hydrodynamic and actuation characteristics, yet most controllers are designed for single-platform deployment. We present an adaptive reinforcement learning approach for trajectory tracking that enables zero-shot cross-platform deployment using a single policy. Since the deployment platform's dynamics are unknown to the policy, we address cross-platform generalization with the standard partial-observability approach of conditioning on interaction history, employing a teacher-student architecture in which a learned module infers a latent representation of the platform dynamics. The policy is trained in simulation under randomized vessel dynamics and is deployed zero-shot to two real-world platforms without any fine-tuning, despite relying on a simple analytical dynamics model rather than a high-fidelity hydrodynamic simulator. In real-world experiments on two different platforms, the adaptive policy outperforms non-adaptive learning-based baselines by up to 58% in position mean absolute error while approaching the tracking accuracy of a platform-specific tuned controller.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce an adaptive RL policy for ASV trajectory tracking that achieves zero-shot cross-platform deployment. It conditions the policy on interaction history via a teacher-student architecture that infers a latent representation of unknown vessel dynamics. Training occurs in simulation by randomizing parameters of a simple analytical dynamics model; the resulting policy is deployed without fine-tuning on two distinct real-world platforms. Experiments report that the adaptive policy reduces position mean absolute error by up to 58% relative to non-adaptive learning baselines while approaching the accuracy of a platform-specific tuned controller.

Significance. If substantiated with adequate experimental detail, the result would demonstrate that standard partial-observability techniques combined with dynamics randomization in a low-fidelity analytical model can produce policies that transfer to real, unmodeled hydrodynamics across platforms. This is noteworthy because it avoids both per-platform retuning and high-fidelity simulators, directly addressing a practical barrier in ASV deployment. The real-world validation on two platforms supplies concrete evidence of transfer, which strengthens the contribution relative to purely simulated studies.

major comments (2)
  1. [Abstract and §5] Abstract and §5 (Real-world Experiments): the central zero-shot claim rests on quantitative gains (up to 58% MAE reduction) yet supplies no training hyperparameters, randomization ranges for the analytical model parameters, number of trials, error bars, statistical tests, or exclusion criteria. These omissions are load-bearing because they prevent independent verification that the reported improvement is robust rather than an artifact of a narrow test set or favorable randomization.
  2. [§4] §4 (Simulation Training): the method randomizes vessel dynamics inside a simple analytical model to produce a policy that generalizes to real hydrodynamic effects on unseen platforms. No explicit ranges, sampling distributions, or coverage analysis for parameters such as damping or added mass are provided. This assumption is load-bearing for the cross-platform claim; if the randomization support does not intersect the distribution of real nonlinear damping and wave-induced forces, the observed transfer on two platforms does not establish broader generalization.
minor comments (1)
  1. [§3] Notation for the latent state inferred by the teacher module is introduced without an explicit equation linking it to the policy input; a single clarifying equation in §3 would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive review. The comments correctly identify omissions that affect reproducibility and the strength of the generalization claim. We address each point below and will revise the manuscript to supply the requested details.

read point-by-point responses
  1. Referee: [Abstract and §5] Abstract and §5 (Real-world Experiments): the central zero-shot claim rests on quantitative gains (up to 58% MAE reduction) yet supplies no training hyperparameters, randomization ranges for the analytical model parameters, number of trials, error bars, statistical tests, or exclusion criteria. These omissions are load-bearing because they prevent independent verification that the reported improvement is robust rather than an artifact of a narrow test set or favorable randomization.

    Authors: We agree these details are required for independent verification. In the revised manuscript we will add, in §5 and a new appendix, the full training hyperparameters, the exact randomization ranges and sampling distributions for all analytical-model parameters, the number of real-world trials per platform, error bars (standard deviation across trials), results of paired statistical tests on the MAE differences, and any trial-exclusion criteria that were applied. The 58 % figure will be reported with these supporting statistics. revision: yes

  2. Referee: [§4] §4 (Simulation Training): the method randomizes vessel dynamics inside a simple analytical model to produce a policy that generalizes to real hydrodynamic effects on unseen platforms. No explicit ranges, sampling distributions, or coverage analysis for parameters such as damping or added mass are provided. This assumption is load-bearing for the cross-platform claim; if the randomization support does not intersect the distribution of real nonlinear damping and wave-induced forces, the observed transfer on two platforms does not establish broader generalization.

    Authors: We accept that explicit randomization details and coverage analysis are necessary. Section 4 will be expanded to list the precise ranges and sampling distributions (uniform over intervals) for every randomized parameter, including linear and quadratic damping coefficients and added-mass terms. We will also add a short coverage discussion that compares the simulated parameter support against typical literature values for the two real platforms; any gaps relative to unmodeled nonlinear or wave effects will be noted as a limitation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper trains a policy via standard RL under randomized analytical dynamics, uses a teacher-student latent module for history-based inference (standard partial-observability), and reports empirical zero-shot transfer metrics on real platforms. No equation or claim reduces a reported performance quantity to a fitted parameter or self-citation by construction. The central result is an experimental outcome, not a definitional identity or renamed input.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies insufficient technical detail to enumerate specific free parameters, axioms, or invented entities; the approach relies on standard RL and partial-observability techniques whose details are not expanded here.

pith-pipeline@v0.9.1-grok · 5691 in / 1066 out tokens · 52577 ms · 2026-07-03T11:51:48.329249+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 1 canonical work pages · 1 internal anchor

  1. [1]

    Unmanned surface vehicles, 15 years of development,

    J. E. Manley, “Unmanned surface vehicles, 15 years of development,” inOCEANS 2008, 2008, pp. 1–4

  2. [2]

    Development and missions of unmanned surface vehicle,

    R.-j. Yan, S. Pang, H.-b. Sun, and Y .-j. Pang, “Development and missions of unmanned surface vehicle,”Journal of Marine Science and Application, vol. 9, pp. 451–457, 12 2010. 6 8 10 12 140 0.3 0.5 1 Time (s) Scale factors s −6 −4 −2 0 2 Latent valuez 0 z0 encoder ˆz0 adapter Fig. 6. Online adaptation of the student’s inferred latentˆz 0 following an abru...

  3. [3]

    Meinig, E

    C. Meinig, E. F. Burger, N. Cohen, E. D. Cokelet, M. F. Cronin, J. N. Cross, S. de Halleux, R. Jenkins, A. T. Jessup, C. W. Mordy, N. Lawrence-Slavas, A. J. Sutton, D. Zhang, and C. Zhang, “Pub- lic–private partnerships to advance regional ocean-observing capabil- ities: A Saildrone and NOAA-PMEL case study and future consid- erations to expand to global ...

  4. [4]

    Experimental evaluation of an au- tonomous surface vehicle for water quality and greenhouse gas emis- sion monitoring,

    M. Dunbabin and A. Grinham, “Experimental evaluation of an au- tonomous surface vehicle for water quality and greenhouse gas emis- sion monitoring,” in2010 IEEE International Conference on Robotics and Automation (ICRA), 2010, pp. 5268–5274

  5. [5]

    A survey on unmanned surface vehicles for disaster robotics: Main challenges and directions,

    V . A. M. Jorge, R. Granada, R. G. Maidana, D. A. Jurak, G. Heck, A. P. F. Negreiros, D. H. dos Santos, L. M. G. Gonc ¸alves, and A. M. Amory, “A survey on unmanned surface vehicles for disaster robotics: Main challenges and directions,”Sensors, vol. 19, no. 3, p. 702, 2019

  6. [6]

    Trends and challenges in unmanned surface vehicles (USV): From survey to shipping,

    C. Barrera, I. Padr ´on Armas, F. Luis, O. Llinas, and G. N. Marichal Plasencia, “Trends and challenges in unmanned surface vehicles (USV): From survey to shipping,”TransNav, the International Journal on Marine Navigation and Safety of Sea Transportation, vol. 15, pp. 135–142, 03 2021

  7. [7]

    Roboat III: An autonomous surface vessel for urban transportation,

    W. Wang, D. Fern ´andez-Guti´errez, R. Doornbusch, J. Jordan, T. Shan, P. Leoni, N. Hagemann, J. Klein Schiphorst, F. Duarte, C. Ratti, and D. Rus, “Roboat III: An autonomous surface vessel for urban transportation,”Journal of Field Robotics, vol. 40, no. 8, pp. 1996– 2009, 2023

  8. [8]

    Solgenia-a test vessel toward energy-efficient autonomous water taxi applications,

    H. Homburger, S. Wirtensohn, P. Hoher, T. Baur, D. Griesser, M. Diehl, and J. Reuter, “Solgenia-a test vessel toward energy-efficient autonomous water taxi applications,”Ocean Engineering, vol. 328, 6 2025

  9. [9]

    Nautilus: An autonomous surface vehicle with a multilayer software architecture for offshore inspection,

    D. F. Campos, E. P. Gonc ¸alves, H. J. Campos, M. I. Pereira, and A. M. Pinto, “Nautilus: An autonomous surface vehicle with a multilayer software architecture for offshore inspection,”Journal of Field Robotics, vol. 41, no. 4, pp. 966–990, 2024

  10. [10]

    Choreographing the Way of Water: A computational framework for aquatic robotic art,

    A. K. Ramachandran Venkatapathy, C. Golling, S. Burmester, N. Sendlhofer, R. Jiang, J. Kamm, and R. D’Andrea, “Choreographing the Way of Water: A computational framework for aquatic robotic art,” inProceedings of the International Conference on New Interfaces for Musical Expression (NIME ’26). London, UK: ACM, Jun. 2026

  11. [11]

    T. I. Fossen,Handbook of Marine Craft Hydrodynamics and Motion Control. John Wiley & Sons, Ltd, 2011

  12. [12]

    Unmanned surface vehicles: An overview of developments and challenges,

    Z. Liu, Y . Zhang, X. Yu, and C. Yuan, “Unmanned surface vehicles: An overview of developments and challenges,”Annual Reviews in Control, vol. 41, pp. 71–93, 2016

  13. [13]

    Design, modeling, and nonlinear model predic- tive tracking control of a novel autonomous surface vehicle,

    W. Wang, L. A. Mateos, S. Park, P. Leoni, B. Gheneti, F. Duarte, C. Ratti, and D. Rus, “Design, modeling, and nonlinear model predic- tive tracking control of a novel autonomous surface vehicle,” in2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 6189–6196

  14. [14]

    Trajectory tracking of a fully-actuated surface vessel using nonlinear model predictive control,

    L. M. Kinjo, S. Wirtensohn, J. Reuter, T. Menard, and O. Gehan, “Trajectory tracking of a fully-actuated surface vessel using nonlinear model predictive control,”IF AC-PapersOnLine, vol. 54, no. 16, pp. 51– 56, 2021, 13th IFAC Conference on Control Applications in Marine Systems, Robotics, and Vehicles CAMS 2021

  15. [15]

    Deep reinforcement learning-based controller for path following of an unmanned surface vehicle,

    J. Woo, C. Yu, and N. Kim, “Deep reinforcement learning-based controller for path following of an unmanned surface vehicle,”Ocean Engineering, vol. 183, pp. 155–166, 2019

  16. [16]

    Deep reinforcement learning based tracking control of an autonomous surface vessel in natural waters,

    W. Wang, X. Cao, A. Gonzalez-Garcia, L. Yin, N. Hagemann, Y . Qiao, C. Ratti, and D. Rus, “Deep reinforcement learning based tracking control of an autonomous surface vessel in natural waters,” in2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 3109–3115

  17. [17]

    Deep re- inforcement learning for path-following control of an autonomous surface vehicle using domain randomization,

    T. Slawik, B. Wehbe, L. Christensen, and F. Kirchner, “Deep re- inforcement learning for path-following control of an autonomous surface vehicle using domain randomization,”IF AC-PapersOnLine, vol. 58, no. 20, pp. 21–26, 2024, 15th IFAC Conference on Control Applications in Marine Systems, Robotics and Vehicles CAMS 2024

  18. [18]

    Learning quadrupedal locomotion over challenging terrain,

    J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,”Science Robotics, vol. 5, no. 47, Oct. 2020

  19. [19]

    RMA: Rapid motor adaptation for legged robots,

    A. Kumar, Z. Fu, D. Pathak, and J. Malik, “RMA: Rapid motor adaptation for legged robots,” inProceedings of Robotics: Science and Systems, 2021

  20. [20]

    Path following optimization for an underactuated USV using smoothly- convergent deep reinforcement learning,

    Y . Zhao, X. Qi, Y . Ma, Z. Li, R. Malekian, and M. A. Sotelo, “Path following optimization for an underactuated USV using smoothly- convergent deep reinforcement learning,”IEEE Transactions on Intel- ligent Transportation Systems, vol. 22, no. 10, pp. 6208–6220, 2021

  21. [21]

    COLREG-compliant collision avoidance for unmanned surface vehicle using deep reinforce- ment learning,

    E. Meyer, A. Heiberg, A. Rasheed, and O. San, “COLREG-compliant collision avoidance for unmanned surface vehicle using deep reinforce- ment learning,”IEEE Access, vol. 8, pp. 165 344–165 364, 2020

  22. [22]

    GenLoco: Generalized locomotion controllers for quadrupedal robots,

    G. Feng, H. Zhang, Z. Li, X. B. Peng, B. Basireddy, L. Yue, Z. Song, L. Yang, Y . Liu, K. Sreenath, and S. Levine, “GenLoco: Generalized locomotion controllers for quadrupedal robots,” inProceedings of The 6th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 205. PMLR, 2023, pp. 1893–1903

  23. [23]

    ManyQuadrupeds: Learn- ing a single locomotion policy for diverse quadruped robots,

    M. Shafiee, G. Bellegarda, and A. Ijspeert, “ManyQuadrupeds: Learn- ing a single locomotion policy for diverse quadruped robots,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 3471–3477

  24. [24]

    Preparing for the unknown: Learning a universal policy with online system identification,

    W. Yu, J. Tan, C. K. Liu, and G. Turk, “Preparing for the unknown: Learning a universal policy with online system identification,” in Proceedings of Robotics: Science and Systems, 2017

  25. [25]

    Sim-to- real transfer of robotic control with dynamics randomization,

    X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- real transfer of robotic control with dynamics randomization,” in2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, May 2018, pp. 3803–3810

  26. [26]

    Recurrent model-free RL can be a strong baseline for many POMDPs,

    T. Ni, B. Eysenbach, and R. Salakhutdinov, “Recurrent model-free RL can be a strong baseline for many POMDPs,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 16 691–16 723

  27. [27]

    Control allocation—a survey,

    T. A. Johansen and T. I. Fossen, “Control allocation—a survey,” Automatica, vol. 49, no. 5, pp. 1087–1103, 2013

  28. [28]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

  29. [29]

    Curriculum learning,

    Y . Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” inProceedings of the 26th Annual International Conference on Machine Learning (ICML), 2009, pp. 41–48

  30. [30]

    On the theory of the Brownian motion,

    G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the Brownian motion,”Phys. Rev., vol. 36, no. 5, pp. 823–841, Sep 1930

  31. [31]

    Roboat: An autonomous surface vehicle for urban waterways,

    W. Wang, B. Gheneti, L. A. Mateos, F. Duarte, C. Ratti, and D. Rus, “Roboat: An autonomous surface vehicle for urban waterways,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 6340–6347

  32. [32]

    Roboat II: A novel autonomous surface vessel for urban environments,

    W. Wang, T. Shan, P. Leoni, D. Fern ´andez-Guti´errez, D. Meyers, C. Ratti, and D. Rus, “Roboat II: A novel autonomous surface vessel for urban environments,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 1740–1747