Cross-Platform Control for Autonomous Surface Vehicles via Adaptive Reinforcement Learning
Pith reviewed 2026-07-03 11:51 UTC · model grok-4.3
The pith
A single reinforcement learning policy trained in simulation generalizes to multiple real-world autonomous surface vehicles without fine-tuning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors show that a policy trained in simulation with randomized vessel dynamics, using a teacher-student architecture to infer latent platform dynamics from interaction history, transfers zero-shot to two distinct real autonomous surface vehicles. The policy achieves position mean absolute error up to 58 percent lower than non-adaptive learning baselines while approaching the accuracy of a platform-specific tuned controller, even though the training model is a simple analytical dynamics approximation rather than a high-fidelity hydrodynamic simulator.
What carries the argument
The teacher-student architecture that infers a latent representation of unknown platform dynamics from interaction history and conditions the policy on that representation.
If this is right
- A single policy can be deployed across multiple autonomous surface vehicle platforms without per-platform retraining or tuning.
- Training can use a basic analytical dynamics model instead of detailed hydrodynamic simulators while still achieving real-world generalization.
- Adaptive policies conditioned on history can close much of the performance gap to hand-tuned platform-specific controllers.
- Cross-platform transfer succeeds despite differences in actuation and hydrodynamic characteristics between the test vehicles.
Where Pith is reading between the lines
- The same history-conditioning technique could support zero-shot transfer on other vehicle types whose dynamics vary across instances, such as different drone frames or ground robots.
- Widening the randomization ranges or adding more platform parameters in simulation might further reduce the remaining gap to tuned controllers.
- The results imply that partial-observability handling via latent inference is more critical for sim-to-real success than using high-fidelity physics models.
Load-bearing premise
Randomizing vessel dynamics inside a simple analytical simulation model produces a policy that generalizes to the complex unmodeled hydrodynamic effects on real unseen platforms.
What would settle it
Deploy the same policy on a third autonomous surface vehicle whose dynamics lie outside the randomization ranges used in training and measure whether position tracking error remains comparable to the tuned controller.
Figures
read the original abstract
Autonomous surface vehicles vary widely in hydrodynamic and actuation characteristics, yet most controllers are designed for single-platform deployment. We present an adaptive reinforcement learning approach for trajectory tracking that enables zero-shot cross-platform deployment using a single policy. Since the deployment platform's dynamics are unknown to the policy, we address cross-platform generalization with the standard partial-observability approach of conditioning on interaction history, employing a teacher-student architecture in which a learned module infers a latent representation of the platform dynamics. The policy is trained in simulation under randomized vessel dynamics and is deployed zero-shot to two real-world platforms without any fine-tuning, despite relying on a simple analytical dynamics model rather than a high-fidelity hydrodynamic simulator. In real-world experiments on two different platforms, the adaptive policy outperforms non-adaptive learning-based baselines by up to 58% in position mean absolute error while approaching the tracking accuracy of a platform-specific tuned controller.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce an adaptive RL policy for ASV trajectory tracking that achieves zero-shot cross-platform deployment. It conditions the policy on interaction history via a teacher-student architecture that infers a latent representation of unknown vessel dynamics. Training occurs in simulation by randomizing parameters of a simple analytical dynamics model; the resulting policy is deployed without fine-tuning on two distinct real-world platforms. Experiments report that the adaptive policy reduces position mean absolute error by up to 58% relative to non-adaptive learning baselines while approaching the accuracy of a platform-specific tuned controller.
Significance. If substantiated with adequate experimental detail, the result would demonstrate that standard partial-observability techniques combined with dynamics randomization in a low-fidelity analytical model can produce policies that transfer to real, unmodeled hydrodynamics across platforms. This is noteworthy because it avoids both per-platform retuning and high-fidelity simulators, directly addressing a practical barrier in ASV deployment. The real-world validation on two platforms supplies concrete evidence of transfer, which strengthens the contribution relative to purely simulated studies.
major comments (2)
- [Abstract and §5] Abstract and §5 (Real-world Experiments): the central zero-shot claim rests on quantitative gains (up to 58% MAE reduction) yet supplies no training hyperparameters, randomization ranges for the analytical model parameters, number of trials, error bars, statistical tests, or exclusion criteria. These omissions are load-bearing because they prevent independent verification that the reported improvement is robust rather than an artifact of a narrow test set or favorable randomization.
- [§4] §4 (Simulation Training): the method randomizes vessel dynamics inside a simple analytical model to produce a policy that generalizes to real hydrodynamic effects on unseen platforms. No explicit ranges, sampling distributions, or coverage analysis for parameters such as damping or added mass are provided. This assumption is load-bearing for the cross-platform claim; if the randomization support does not intersect the distribution of real nonlinear damping and wave-induced forces, the observed transfer on two platforms does not establish broader generalization.
minor comments (1)
- [§3] Notation for the latent state inferred by the teacher module is introduced without an explicit equation linking it to the policy input; a single clarifying equation in §3 would improve readability.
Simulated Author's Rebuttal
Thank you for the constructive review. The comments correctly identify omissions that affect reproducibility and the strength of the generalization claim. We address each point below and will revise the manuscript to supply the requested details.
read point-by-point responses
-
Referee: [Abstract and §5] Abstract and §5 (Real-world Experiments): the central zero-shot claim rests on quantitative gains (up to 58% MAE reduction) yet supplies no training hyperparameters, randomization ranges for the analytical model parameters, number of trials, error bars, statistical tests, or exclusion criteria. These omissions are load-bearing because they prevent independent verification that the reported improvement is robust rather than an artifact of a narrow test set or favorable randomization.
Authors: We agree these details are required for independent verification. In the revised manuscript we will add, in §5 and a new appendix, the full training hyperparameters, the exact randomization ranges and sampling distributions for all analytical-model parameters, the number of real-world trials per platform, error bars (standard deviation across trials), results of paired statistical tests on the MAE differences, and any trial-exclusion criteria that were applied. The 58 % figure will be reported with these supporting statistics. revision: yes
-
Referee: [§4] §4 (Simulation Training): the method randomizes vessel dynamics inside a simple analytical model to produce a policy that generalizes to real hydrodynamic effects on unseen platforms. No explicit ranges, sampling distributions, or coverage analysis for parameters such as damping or added mass are provided. This assumption is load-bearing for the cross-platform claim; if the randomization support does not intersect the distribution of real nonlinear damping and wave-induced forces, the observed transfer on two platforms does not establish broader generalization.
Authors: We accept that explicit randomization details and coverage analysis are necessary. Section 4 will be expanded to list the precise ranges and sampling distributions (uniform over intervals) for every randomized parameter, including linear and quadratic damping coefficients and added-mass terms. We will also add a short coverage discussion that compares the simulated parameter support against typical literature values for the two real platforms; any gaps relative to unmodeled nonlinear or wave effects will be noted as a limitation. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper trains a policy via standard RL under randomized analytical dynamics, uses a teacher-student latent module for history-based inference (standard partial-observability), and reports empirical zero-shot transfer metrics on real platforms. No equation or claim reduces a reported performance quantity to a fitted parameter or self-citation by construction. The central result is an experimental outcome, not a definitional identity or renamed input.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Unmanned surface vehicles, 15 years of development,
J. E. Manley, “Unmanned surface vehicles, 15 years of development,” inOCEANS 2008, 2008, pp. 1–4
2008
-
[2]
Development and missions of unmanned surface vehicle,
R.-j. Yan, S. Pang, H.-b. Sun, and Y .-j. Pang, “Development and missions of unmanned surface vehicle,”Journal of Marine Science and Application, vol. 9, pp. 451–457, 12 2010. 6 8 10 12 140 0.3 0.5 1 Time (s) Scale factors s −6 −4 −2 0 2 Latent valuez 0 z0 encoder ˆz0 adapter Fig. 6. Online adaptation of the student’s inferred latentˆz 0 following an abru...
2010
-
[3]
Meinig, E
C. Meinig, E. F. Burger, N. Cohen, E. D. Cokelet, M. F. Cronin, J. N. Cross, S. de Halleux, R. Jenkins, A. T. Jessup, C. W. Mordy, N. Lawrence-Slavas, A. J. Sutton, D. Zhang, and C. Zhang, “Pub- lic–private partnerships to advance regional ocean-observing capabil- ities: A Saildrone and NOAA-PMEL case study and future consid- erations to expand to global ...
2019
-
[4]
Experimental evaluation of an au- tonomous surface vehicle for water quality and greenhouse gas emis- sion monitoring,
M. Dunbabin and A. Grinham, “Experimental evaluation of an au- tonomous surface vehicle for water quality and greenhouse gas emis- sion monitoring,” in2010 IEEE International Conference on Robotics and Automation (ICRA), 2010, pp. 5268–5274
2010
-
[5]
A survey on unmanned surface vehicles for disaster robotics: Main challenges and directions,
V . A. M. Jorge, R. Granada, R. G. Maidana, D. A. Jurak, G. Heck, A. P. F. Negreiros, D. H. dos Santos, L. M. G. Gonc ¸alves, and A. M. Amory, “A survey on unmanned surface vehicles for disaster robotics: Main challenges and directions,”Sensors, vol. 19, no. 3, p. 702, 2019
2019
-
[6]
Trends and challenges in unmanned surface vehicles (USV): From survey to shipping,
C. Barrera, I. Padr ´on Armas, F. Luis, O. Llinas, and G. N. Marichal Plasencia, “Trends and challenges in unmanned surface vehicles (USV): From survey to shipping,”TransNav, the International Journal on Marine Navigation and Safety of Sea Transportation, vol. 15, pp. 135–142, 03 2021
2021
-
[7]
Roboat III: An autonomous surface vessel for urban transportation,
W. Wang, D. Fern ´andez-Guti´errez, R. Doornbusch, J. Jordan, T. Shan, P. Leoni, N. Hagemann, J. Klein Schiphorst, F. Duarte, C. Ratti, and D. Rus, “Roboat III: An autonomous surface vessel for urban transportation,”Journal of Field Robotics, vol. 40, no. 8, pp. 1996– 2009, 2023
1996
-
[8]
Solgenia-a test vessel toward energy-efficient autonomous water taxi applications,
H. Homburger, S. Wirtensohn, P. Hoher, T. Baur, D. Griesser, M. Diehl, and J. Reuter, “Solgenia-a test vessel toward energy-efficient autonomous water taxi applications,”Ocean Engineering, vol. 328, 6 2025
2025
-
[9]
Nautilus: An autonomous surface vehicle with a multilayer software architecture for offshore inspection,
D. F. Campos, E. P. Gonc ¸alves, H. J. Campos, M. I. Pereira, and A. M. Pinto, “Nautilus: An autonomous surface vehicle with a multilayer software architecture for offshore inspection,”Journal of Field Robotics, vol. 41, no. 4, pp. 966–990, 2024
2024
-
[10]
Choreographing the Way of Water: A computational framework for aquatic robotic art,
A. K. Ramachandran Venkatapathy, C. Golling, S. Burmester, N. Sendlhofer, R. Jiang, J. Kamm, and R. D’Andrea, “Choreographing the Way of Water: A computational framework for aquatic robotic art,” inProceedings of the International Conference on New Interfaces for Musical Expression (NIME ’26). London, UK: ACM, Jun. 2026
2026
-
[11]
T. I. Fossen,Handbook of Marine Craft Hydrodynamics and Motion Control. John Wiley & Sons, Ltd, 2011
2011
-
[12]
Unmanned surface vehicles: An overview of developments and challenges,
Z. Liu, Y . Zhang, X. Yu, and C. Yuan, “Unmanned surface vehicles: An overview of developments and challenges,”Annual Reviews in Control, vol. 41, pp. 71–93, 2016
2016
-
[13]
Design, modeling, and nonlinear model predic- tive tracking control of a novel autonomous surface vehicle,
W. Wang, L. A. Mateos, S. Park, P. Leoni, B. Gheneti, F. Duarte, C. Ratti, and D. Rus, “Design, modeling, and nonlinear model predic- tive tracking control of a novel autonomous surface vehicle,” in2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 6189–6196
2018
-
[14]
Trajectory tracking of a fully-actuated surface vessel using nonlinear model predictive control,
L. M. Kinjo, S. Wirtensohn, J. Reuter, T. Menard, and O. Gehan, “Trajectory tracking of a fully-actuated surface vessel using nonlinear model predictive control,”IF AC-PapersOnLine, vol. 54, no. 16, pp. 51– 56, 2021, 13th IFAC Conference on Control Applications in Marine Systems, Robotics, and Vehicles CAMS 2021
2021
-
[15]
Deep reinforcement learning-based controller for path following of an unmanned surface vehicle,
J. Woo, C. Yu, and N. Kim, “Deep reinforcement learning-based controller for path following of an unmanned surface vehicle,”Ocean Engineering, vol. 183, pp. 155–166, 2019
2019
-
[16]
Deep reinforcement learning based tracking control of an autonomous surface vessel in natural waters,
W. Wang, X. Cao, A. Gonzalez-Garcia, L. Yin, N. Hagemann, Y . Qiao, C. Ratti, and D. Rus, “Deep reinforcement learning based tracking control of an autonomous surface vessel in natural waters,” in2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 3109–3115
2023
-
[17]
Deep re- inforcement learning for path-following control of an autonomous surface vehicle using domain randomization,
T. Slawik, B. Wehbe, L. Christensen, and F. Kirchner, “Deep re- inforcement learning for path-following control of an autonomous surface vehicle using domain randomization,”IF AC-PapersOnLine, vol. 58, no. 20, pp. 21–26, 2024, 15th IFAC Conference on Control Applications in Marine Systems, Robotics and Vehicles CAMS 2024
2024
-
[18]
Learning quadrupedal locomotion over challenging terrain,
J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,”Science Robotics, vol. 5, no. 47, Oct. 2020
2020
-
[19]
RMA: Rapid motor adaptation for legged robots,
A. Kumar, Z. Fu, D. Pathak, and J. Malik, “RMA: Rapid motor adaptation for legged robots,” inProceedings of Robotics: Science and Systems, 2021
2021
-
[20]
Path following optimization for an underactuated USV using smoothly- convergent deep reinforcement learning,
Y . Zhao, X. Qi, Y . Ma, Z. Li, R. Malekian, and M. A. Sotelo, “Path following optimization for an underactuated USV using smoothly- convergent deep reinforcement learning,”IEEE Transactions on Intel- ligent Transportation Systems, vol. 22, no. 10, pp. 6208–6220, 2021
2021
-
[21]
COLREG-compliant collision avoidance for unmanned surface vehicle using deep reinforce- ment learning,
E. Meyer, A. Heiberg, A. Rasheed, and O. San, “COLREG-compliant collision avoidance for unmanned surface vehicle using deep reinforce- ment learning,”IEEE Access, vol. 8, pp. 165 344–165 364, 2020
2020
-
[22]
GenLoco: Generalized locomotion controllers for quadrupedal robots,
G. Feng, H. Zhang, Z. Li, X. B. Peng, B. Basireddy, L. Yue, Z. Song, L. Yang, Y . Liu, K. Sreenath, and S. Levine, “GenLoco: Generalized locomotion controllers for quadrupedal robots,” inProceedings of The 6th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 205. PMLR, 2023, pp. 1893–1903
2023
-
[23]
ManyQuadrupeds: Learn- ing a single locomotion policy for diverse quadruped robots,
M. Shafiee, G. Bellegarda, and A. Ijspeert, “ManyQuadrupeds: Learn- ing a single locomotion policy for diverse quadruped robots,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 3471–3477
2024
-
[24]
Preparing for the unknown: Learning a universal policy with online system identification,
W. Yu, J. Tan, C. K. Liu, and G. Turk, “Preparing for the unknown: Learning a universal policy with online system identification,” in Proceedings of Robotics: Science and Systems, 2017
2017
-
[25]
Sim-to- real transfer of robotic control with dynamics randomization,
X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- real transfer of robotic control with dynamics randomization,” in2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, May 2018, pp. 3803–3810
2018
-
[26]
Recurrent model-free RL can be a strong baseline for many POMDPs,
T. Ni, B. Eysenbach, and R. Salakhutdinov, “Recurrent model-free RL can be a strong baseline for many POMDPs,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 16 691–16 723
2022
-
[27]
Control allocation—a survey,
T. A. Johansen and T. I. Fossen, “Control allocation—a survey,” Automatica, vol. 49, no. 5, pp. 1087–1103, 2013
2013
-
[28]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[29]
Curriculum learning,
Y . Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” inProceedings of the 26th Annual International Conference on Machine Learning (ICML), 2009, pp. 41–48
2009
-
[30]
On the theory of the Brownian motion,
G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the Brownian motion,”Phys. Rev., vol. 36, no. 5, pp. 823–841, Sep 1930
1930
-
[31]
Roboat: An autonomous surface vehicle for urban waterways,
W. Wang, B. Gheneti, L. A. Mateos, F. Duarte, C. Ratti, and D. Rus, “Roboat: An autonomous surface vehicle for urban waterways,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 6340–6347
2019
-
[32]
Roboat II: A novel autonomous surface vessel for urban environments,
W. Wang, T. Shan, P. Leoni, D. Fern ´andez-Guti´errez, D. Meyers, C. Ratti, and D. Rus, “Roboat II: A novel autonomous surface vessel for urban environments,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 1740–1747
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.