Cross-Platform Control for Autonomous Surface Vehicles via Adaptive Reinforcement Learning

Aswin Ramachandran; Raffaello D'Andrea; Ruiheng Jiang; Thomas Bi

arxiv: 2607.02037 · v1 · pith:BJDIZ7H2new · submitted 2026-07-02 · 💻 cs.RO · cs.LG

Cross-Platform Control for Autonomous Surface Vehicles via Adaptive Reinforcement Learning

Ruiheng Jiang , Thomas Bi , Raffaello D'Andrea , Aswin Ramachandran This is my paper

Pith reviewed 2026-07-03 11:51 UTC · model grok-4.3

classification 💻 cs.RO cs.LG

keywords reinforcement learningautonomous surface vehiclestrajectory trackingcross-platform generalizationzero-shot deploymentadaptive controlteacher-student architecture

0 comments

The pith

A single reinforcement learning policy trained in simulation generalizes to multiple real-world autonomous surface vehicles without fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to establish that an adaptive reinforcement learning controller, trained once under randomized vessel dynamics in a basic simulation, can track trajectories on different physical boats without any retraining or platform-specific adjustments. It handles unknown dynamics by conditioning the policy on interaction history through a teacher-student module that infers a latent representation of each platform. This approach matters because most controllers require separate design and tuning for each vehicle due to differences in shape, mass, and water interaction. If the claim holds, one learned policy could serve varied platforms directly from simulation. Real-world tests on two platforms show the adaptive policy reduces position error by up to 58 percent compared with non-adaptive learning baselines and nearly matches a tuned controller built for one platform.

Core claim

The authors show that a policy trained in simulation with randomized vessel dynamics, using a teacher-student architecture to infer latent platform dynamics from interaction history, transfers zero-shot to two distinct real autonomous surface vehicles. The policy achieves position mean absolute error up to 58 percent lower than non-adaptive learning baselines while approaching the accuracy of a platform-specific tuned controller, even though the training model is a simple analytical dynamics approximation rather than a high-fidelity hydrodynamic simulator.

What carries the argument

The teacher-student architecture that infers a latent representation of unknown platform dynamics from interaction history and conditions the policy on that representation.

If this is right

A single policy can be deployed across multiple autonomous surface vehicle platforms without per-platform retraining or tuning.
Training can use a basic analytical dynamics model instead of detailed hydrodynamic simulators while still achieving real-world generalization.
Adaptive policies conditioned on history can close much of the performance gap to hand-tuned platform-specific controllers.
Cross-platform transfer succeeds despite differences in actuation and hydrodynamic characteristics between the test vehicles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same history-conditioning technique could support zero-shot transfer on other vehicle types whose dynamics vary across instances, such as different drone frames or ground robots.
Widening the randomization ranges or adding more platform parameters in simulation might further reduce the remaining gap to tuned controllers.
The results imply that partial-observability handling via latent inference is more critical for sim-to-real success than using high-fidelity physics models.

Load-bearing premise

Randomizing vessel dynamics inside a simple analytical simulation model produces a policy that generalizes to the complex unmodeled hydrodynamic effects on real unseen platforms.

What would settle it

Deploy the same policy on a third autonomous surface vehicle whose dynamics lie outside the randomization ranges used in training and measure whether position tracking error remains comparable to the tuned controller.

Figures

Figures reproduced from arXiv: 2607.02037 by Aswin Ramachandran, Raffaello D'Andrea, Ruiheng Jiang, Thomas Bi.

**Figure 2.** Figure 2: Coordinate frames and velocity components for the 3-DoF [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Teacher–student architecture. In Phase 1, the encoder maps the privileged dynamics vector [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Scree plot of the learned latent z over the dynamics. The first two principal components capture 91% of the variance, indicating that the ninedimensional latent occupies a low-dimensional subspace and that matching dz to de is not a binding constraint. of yaw control authorities across the simulated platforms, where adaptation has a stronger effect. The recurrent policy achieves 0.96 cm and 2.34◦ , improv… view at source ↗

**Figure 6.** Figure 6: Online adaptation of the student’s inferred latent [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 5.** Figure 5: Teacher latent component z0 as a function of the uniform dynamicsscaling factor s, for the controlled one-dimensional sweep variant. ACKNOWLEDGMENTS The authors thank Sebastian Burmester, Jan Kamm, and Noa Sendlhofer for their help with experimental testing and for valuable discussions. REFERENCES [1] J. E. Manley, “Unmanned surface vehicles, 15 years of development,” in OCEANS 2008, 2008, pp. 1–4. [2] R.… view at source ↗

read the original abstract

Autonomous surface vehicles vary widely in hydrodynamic and actuation characteristics, yet most controllers are designed for single-platform deployment. We present an adaptive reinforcement learning approach for trajectory tracking that enables zero-shot cross-platform deployment using a single policy. Since the deployment platform's dynamics are unknown to the policy, we address cross-platform generalization with the standard partial-observability approach of conditioning on interaction history, employing a teacher-student architecture in which a learned module infers a latent representation of the platform dynamics. The policy is trained in simulation under randomized vessel dynamics and is deployed zero-shot to two real-world platforms without any fine-tuning, despite relying on a simple analytical dynamics model rather than a high-fidelity hydrodynamic simulator. In real-world experiments on two different platforms, the adaptive policy outperforms non-adaptive learning-based baselines by up to 58% in position mean absolute error while approaching the tracking accuracy of a platform-specific tuned controller.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows one RL policy with history-based latent inference transferring zero-shot to two real ASVs and beating non-adaptive baselines, but the simple analytical sim for randomization is the part that needs scrutiny.

read the letter

The main thing to know is that a single policy trained under randomized simple vessel dynamics plus a teacher-student module for inferring latent platform properties from history tracks trajectories on two different real ASVs without any fine-tuning, cutting position MAE by up to 58% versus non-adaptive RL baselines and getting close to a platform-specific tuned controller.

What is new is the concrete zero-shot cross-platform result on actual marine hardware using the standard partial-observability approach of conditioning on interaction history. The paper does well by reporting real deployments on two vessels rather than stopping at simulation, which gives the claim some grounding.

The soft spot is the training distribution. Randomizing parameters inside a basic analytical dynamics model may not span the real hydrodynamic effects (nonlinear damping, wave forces, platform-specific interactions) that appear on unseen vessels. The two successful platforms are encouraging, but they do not by themselves confirm that the randomization covered the relevant support; if the latent module is mostly fitting sim artifacts, broader testing could show larger gaps. The abstract also omits training details, error bars, statistical tests, and how the randomization ranges were chosen, which makes it harder to judge how much of the gain comes from the method versus careful tuning of the sim.

No obvious circularity or load-bearing fitting issues show up. The work is for people in marine robotics and sim-to-real RL who want practical transfer examples. A reader focused on hardware validation will get value from the experiments.

It deserves serious referee time because the real-world results are there and the method is clearly described enough to evaluate.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce an adaptive RL policy for ASV trajectory tracking that achieves zero-shot cross-platform deployment. It conditions the policy on interaction history via a teacher-student architecture that infers a latent representation of unknown vessel dynamics. Training occurs in simulation by randomizing parameters of a simple analytical dynamics model; the resulting policy is deployed without fine-tuning on two distinct real-world platforms. Experiments report that the adaptive policy reduces position mean absolute error by up to 58% relative to non-adaptive learning baselines while approaching the accuracy of a platform-specific tuned controller.

Significance. If substantiated with adequate experimental detail, the result would demonstrate that standard partial-observability techniques combined with dynamics randomization in a low-fidelity analytical model can produce policies that transfer to real, unmodeled hydrodynamics across platforms. This is noteworthy because it avoids both per-platform retuning and high-fidelity simulators, directly addressing a practical barrier in ASV deployment. The real-world validation on two platforms supplies concrete evidence of transfer, which strengthens the contribution relative to purely simulated studies.

major comments (2)

[Abstract and §5] Abstract and §5 (Real-world Experiments): the central zero-shot claim rests on quantitative gains (up to 58% MAE reduction) yet supplies no training hyperparameters, randomization ranges for the analytical model parameters, number of trials, error bars, statistical tests, or exclusion criteria. These omissions are load-bearing because they prevent independent verification that the reported improvement is robust rather than an artifact of a narrow test set or favorable randomization.
[§4] §4 (Simulation Training): the method randomizes vessel dynamics inside a simple analytical model to produce a policy that generalizes to real hydrodynamic effects on unseen platforms. No explicit ranges, sampling distributions, or coverage analysis for parameters such as damping or added mass are provided. This assumption is load-bearing for the cross-platform claim; if the randomization support does not intersect the distribution of real nonlinear damping and wave-induced forces, the observed transfer on two platforms does not establish broader generalization.

minor comments (1)

[§3] Notation for the latent state inferred by the teacher module is introduced without an explicit equation linking it to the policy input; a single clarifying equation in §3 would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive review. The comments correctly identify omissions that affect reproducibility and the strength of the generalization claim. We address each point below and will revise the manuscript to supply the requested details.

read point-by-point responses

Referee: [Abstract and §5] Abstract and §5 (Real-world Experiments): the central zero-shot claim rests on quantitative gains (up to 58% MAE reduction) yet supplies no training hyperparameters, randomization ranges for the analytical model parameters, number of trials, error bars, statistical tests, or exclusion criteria. These omissions are load-bearing because they prevent independent verification that the reported improvement is robust rather than an artifact of a narrow test set or favorable randomization.

Authors: We agree these details are required for independent verification. In the revised manuscript we will add, in §5 and a new appendix, the full training hyperparameters, the exact randomization ranges and sampling distributions for all analytical-model parameters, the number of real-world trials per platform, error bars (standard deviation across trials), results of paired statistical tests on the MAE differences, and any trial-exclusion criteria that were applied. The 58 % figure will be reported with these supporting statistics. revision: yes
Referee: [§4] §4 (Simulation Training): the method randomizes vessel dynamics inside a simple analytical model to produce a policy that generalizes to real hydrodynamic effects on unseen platforms. No explicit ranges, sampling distributions, or coverage analysis for parameters such as damping or added mass are provided. This assumption is load-bearing for the cross-platform claim; if the randomization support does not intersect the distribution of real nonlinear damping and wave-induced forces, the observed transfer on two platforms does not establish broader generalization.

Authors: We accept that explicit randomization details and coverage analysis are necessary. Section 4 will be expanded to list the precise ranges and sampling distributions (uniform over intervals) for every randomized parameter, including linear and quadratic damping coefficients and added-mass terms. We will also add a short coverage discussion that compares the simulated parameter support against typical literature values for the two real platforms; any gaps relative to unmodeled nonlinear or wave effects will be noted as a limitation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper trains a policy via standard RL under randomized analytical dynamics, uses a teacher-student latent module for history-based inference (standard partial-observability), and reports empirical zero-shot transfer metrics on real platforms. No equation or claim reduces a reported performance quantity to a fitted parameter or self-citation by construction. The central result is an experimental outcome, not a definitional identity or renamed input.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies insufficient technical detail to enumerate specific free parameters, axioms, or invented entities; the approach relies on standard RL and partial-observability techniques whose details are not expanded here.

pith-pipeline@v0.9.1-grok · 5691 in / 1066 out tokens · 52577 ms · 2026-07-03T11:51:48.329249+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Unmanned surface vehicles, 15 years of development,

J. E. Manley, “Unmanned surface vehicles, 15 years of development,” inOCEANS 2008, 2008, pp. 1–4

2008
[2]

Development and missions of unmanned surface vehicle,

R.-j. Yan, S. Pang, H.-b. Sun, and Y .-j. Pang, “Development and missions of unmanned surface vehicle,”Journal of Marine Science and Application, vol. 9, pp. 451–457, 12 2010. 6 8 10 12 140 0.3 0.5 1 Time (s) Scale factors s −6 −4 −2 0 2 Latent valuez 0 z0 encoder ˆz0 adapter Fig. 6. Online adaptation of the student’s inferred latentˆz 0 following an abru...

2010
[3]

Meinig, E

C. Meinig, E. F. Burger, N. Cohen, E. D. Cokelet, M. F. Cronin, J. N. Cross, S. de Halleux, R. Jenkins, A. T. Jessup, C. W. Mordy, N. Lawrence-Slavas, A. J. Sutton, D. Zhang, and C. Zhang, “Pub- lic–private partnerships to advance regional ocean-observing capabil- ities: A Saildrone and NOAA-PMEL case study and future consid- erations to expand to global ...

2019
[4]

Experimental evaluation of an au- tonomous surface vehicle for water quality and greenhouse gas emis- sion monitoring,

M. Dunbabin and A. Grinham, “Experimental evaluation of an au- tonomous surface vehicle for water quality and greenhouse gas emis- sion monitoring,” in2010 IEEE International Conference on Robotics and Automation (ICRA), 2010, pp. 5268–5274

2010
[5]

A survey on unmanned surface vehicles for disaster robotics: Main challenges and directions,

V . A. M. Jorge, R. Granada, R. G. Maidana, D. A. Jurak, G. Heck, A. P. F. Negreiros, D. H. dos Santos, L. M. G. Gonc ¸alves, and A. M. Amory, “A survey on unmanned surface vehicles for disaster robotics: Main challenges and directions,”Sensors, vol. 19, no. 3, p. 702, 2019

2019
[6]

Trends and challenges in unmanned surface vehicles (USV): From survey to shipping,

C. Barrera, I. Padr ´on Armas, F. Luis, O. Llinas, and G. N. Marichal Plasencia, “Trends and challenges in unmanned surface vehicles (USV): From survey to shipping,”TransNav, the International Journal on Marine Navigation and Safety of Sea Transportation, vol. 15, pp. 135–142, 03 2021

2021
[7]

Roboat III: An autonomous surface vessel for urban transportation,

W. Wang, D. Fern ´andez-Guti´errez, R. Doornbusch, J. Jordan, T. Shan, P. Leoni, N. Hagemann, J. Klein Schiphorst, F. Duarte, C. Ratti, and D. Rus, “Roboat III: An autonomous surface vessel for urban transportation,”Journal of Field Robotics, vol. 40, no. 8, pp. 1996– 2009, 2023

1996
[8]

Solgenia-a test vessel toward energy-efficient autonomous water taxi applications,

H. Homburger, S. Wirtensohn, P. Hoher, T. Baur, D. Griesser, M. Diehl, and J. Reuter, “Solgenia-a test vessel toward energy-efficient autonomous water taxi applications,”Ocean Engineering, vol. 328, 6 2025

2025
[9]

Nautilus: An autonomous surface vehicle with a multilayer software architecture for offshore inspection,

D. F. Campos, E. P. Gonc ¸alves, H. J. Campos, M. I. Pereira, and A. M. Pinto, “Nautilus: An autonomous surface vehicle with a multilayer software architecture for offshore inspection,”Journal of Field Robotics, vol. 41, no. 4, pp. 966–990, 2024

2024
[10]

Choreographing the Way of Water: A computational framework for aquatic robotic art,

A. K. Ramachandran Venkatapathy, C. Golling, S. Burmester, N. Sendlhofer, R. Jiang, J. Kamm, and R. D’Andrea, “Choreographing the Way of Water: A computational framework for aquatic robotic art,” inProceedings of the International Conference on New Interfaces for Musical Expression (NIME ’26). London, UK: ACM, Jun. 2026

2026
[11]

T. I. Fossen,Handbook of Marine Craft Hydrodynamics and Motion Control. John Wiley & Sons, Ltd, 2011

2011
[12]

Unmanned surface vehicles: An overview of developments and challenges,

Z. Liu, Y . Zhang, X. Yu, and C. Yuan, “Unmanned surface vehicles: An overview of developments and challenges,”Annual Reviews in Control, vol. 41, pp. 71–93, 2016

2016
[13]

Design, modeling, and nonlinear model predic- tive tracking control of a novel autonomous surface vehicle,

W. Wang, L. A. Mateos, S. Park, P. Leoni, B. Gheneti, F. Duarte, C. Ratti, and D. Rus, “Design, modeling, and nonlinear model predic- tive tracking control of a novel autonomous surface vehicle,” in2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 6189–6196

2018
[14]

Trajectory tracking of a fully-actuated surface vessel using nonlinear model predictive control,

L. M. Kinjo, S. Wirtensohn, J. Reuter, T. Menard, and O. Gehan, “Trajectory tracking of a fully-actuated surface vessel using nonlinear model predictive control,”IF AC-PapersOnLine, vol. 54, no. 16, pp. 51– 56, 2021, 13th IFAC Conference on Control Applications in Marine Systems, Robotics, and Vehicles CAMS 2021

2021
[15]

Deep reinforcement learning-based controller for path following of an unmanned surface vehicle,

J. Woo, C. Yu, and N. Kim, “Deep reinforcement learning-based controller for path following of an unmanned surface vehicle,”Ocean Engineering, vol. 183, pp. 155–166, 2019

2019
[16]

Deep reinforcement learning based tracking control of an autonomous surface vessel in natural waters,

W. Wang, X. Cao, A. Gonzalez-Garcia, L. Yin, N. Hagemann, Y . Qiao, C. Ratti, and D. Rus, “Deep reinforcement learning based tracking control of an autonomous surface vessel in natural waters,” in2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 3109–3115

2023
[17]

Deep re- inforcement learning for path-following control of an autonomous surface vehicle using domain randomization,

T. Slawik, B. Wehbe, L. Christensen, and F. Kirchner, “Deep re- inforcement learning for path-following control of an autonomous surface vehicle using domain randomization,”IF AC-PapersOnLine, vol. 58, no. 20, pp. 21–26, 2024, 15th IFAC Conference on Control Applications in Marine Systems, Robotics and Vehicles CAMS 2024

2024
[18]

Learning quadrupedal locomotion over challenging terrain,

J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,”Science Robotics, vol. 5, no. 47, Oct. 2020

2020
[19]

RMA: Rapid motor adaptation for legged robots,

A. Kumar, Z. Fu, D. Pathak, and J. Malik, “RMA: Rapid motor adaptation for legged robots,” inProceedings of Robotics: Science and Systems, 2021

2021
[20]

Path following optimization for an underactuated USV using smoothly- convergent deep reinforcement learning,

Y . Zhao, X. Qi, Y . Ma, Z. Li, R. Malekian, and M. A. Sotelo, “Path following optimization for an underactuated USV using smoothly- convergent deep reinforcement learning,”IEEE Transactions on Intel- ligent Transportation Systems, vol. 22, no. 10, pp. 6208–6220, 2021

2021
[21]

COLREG-compliant collision avoidance for unmanned surface vehicle using deep reinforce- ment learning,

E. Meyer, A. Heiberg, A. Rasheed, and O. San, “COLREG-compliant collision avoidance for unmanned surface vehicle using deep reinforce- ment learning,”IEEE Access, vol. 8, pp. 165 344–165 364, 2020

2020
[22]

GenLoco: Generalized locomotion controllers for quadrupedal robots,

G. Feng, H. Zhang, Z. Li, X. B. Peng, B. Basireddy, L. Yue, Z. Song, L. Yang, Y . Liu, K. Sreenath, and S. Levine, “GenLoco: Generalized locomotion controllers for quadrupedal robots,” inProceedings of The 6th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 205. PMLR, 2023, pp. 1893–1903

2023
[23]

ManyQuadrupeds: Learn- ing a single locomotion policy for diverse quadruped robots,

M. Shafiee, G. Bellegarda, and A. Ijspeert, “ManyQuadrupeds: Learn- ing a single locomotion policy for diverse quadruped robots,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 3471–3477

2024
[24]

Preparing for the unknown: Learning a universal policy with online system identification,

W. Yu, J. Tan, C. K. Liu, and G. Turk, “Preparing for the unknown: Learning a universal policy with online system identification,” in Proceedings of Robotics: Science and Systems, 2017

2017
[25]

Sim-to- real transfer of robotic control with dynamics randomization,

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- real transfer of robotic control with dynamics randomization,” in2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, May 2018, pp. 3803–3810

2018
[26]

Recurrent model-free RL can be a strong baseline for many POMDPs,

T. Ni, B. Eysenbach, and R. Salakhutdinov, “Recurrent model-free RL can be a strong baseline for many POMDPs,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 16 691–16 723

2022
[27]

Control allocation—a survey,

T. A. Johansen and T. I. Fossen, “Control allocation—a survey,” Automatica, vol. 49, no. 5, pp. 1087–1103, 2013

2013
[28]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[29]

Curriculum learning,

Y . Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” inProceedings of the 26th Annual International Conference on Machine Learning (ICML), 2009, pp. 41–48

2009
[30]

On the theory of the Brownian motion,

G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the Brownian motion,”Phys. Rev., vol. 36, no. 5, pp. 823–841, Sep 1930

1930
[31]

Roboat: An autonomous surface vehicle for urban waterways,

W. Wang, B. Gheneti, L. A. Mateos, F. Duarte, C. Ratti, and D. Rus, “Roboat: An autonomous surface vehicle for urban waterways,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 6340–6347

2019
[32]

Roboat II: A novel autonomous surface vessel for urban environments,

W. Wang, T. Shan, P. Leoni, D. Fern ´andez-Guti´errez, D. Meyers, C. Ratti, and D. Rus, “Roboat II: A novel autonomous surface vessel for urban environments,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 1740–1747

2020

[1] [1]

Unmanned surface vehicles, 15 years of development,

J. E. Manley, “Unmanned surface vehicles, 15 years of development,” inOCEANS 2008, 2008, pp. 1–4

2008

[2] [2]

Development and missions of unmanned surface vehicle,

R.-j. Yan, S. Pang, H.-b. Sun, and Y .-j. Pang, “Development and missions of unmanned surface vehicle,”Journal of Marine Science and Application, vol. 9, pp. 451–457, 12 2010. 6 8 10 12 140 0.3 0.5 1 Time (s) Scale factors s −6 −4 −2 0 2 Latent valuez 0 z0 encoder ˆz0 adapter Fig. 6. Online adaptation of the student’s inferred latentˆz 0 following an abru...

2010

[3] [3]

Meinig, E

C. Meinig, E. F. Burger, N. Cohen, E. D. Cokelet, M. F. Cronin, J. N. Cross, S. de Halleux, R. Jenkins, A. T. Jessup, C. W. Mordy, N. Lawrence-Slavas, A. J. Sutton, D. Zhang, and C. Zhang, “Pub- lic–private partnerships to advance regional ocean-observing capabil- ities: A Saildrone and NOAA-PMEL case study and future consid- erations to expand to global ...

2019

[4] [4]

Experimental evaluation of an au- tonomous surface vehicle for water quality and greenhouse gas emis- sion monitoring,

M. Dunbabin and A. Grinham, “Experimental evaluation of an au- tonomous surface vehicle for water quality and greenhouse gas emis- sion monitoring,” in2010 IEEE International Conference on Robotics and Automation (ICRA), 2010, pp. 5268–5274

2010

[5] [5]

A survey on unmanned surface vehicles for disaster robotics: Main challenges and directions,

V . A. M. Jorge, R. Granada, R. G. Maidana, D. A. Jurak, G. Heck, A. P. F. Negreiros, D. H. dos Santos, L. M. G. Gonc ¸alves, and A. M. Amory, “A survey on unmanned surface vehicles for disaster robotics: Main challenges and directions,”Sensors, vol. 19, no. 3, p. 702, 2019

2019

[6] [6]

Trends and challenges in unmanned surface vehicles (USV): From survey to shipping,

C. Barrera, I. Padr ´on Armas, F. Luis, O. Llinas, and G. N. Marichal Plasencia, “Trends and challenges in unmanned surface vehicles (USV): From survey to shipping,”TransNav, the International Journal on Marine Navigation and Safety of Sea Transportation, vol. 15, pp. 135–142, 03 2021

2021

[7] [7]

Roboat III: An autonomous surface vessel for urban transportation,

W. Wang, D. Fern ´andez-Guti´errez, R. Doornbusch, J. Jordan, T. Shan, P. Leoni, N. Hagemann, J. Klein Schiphorst, F. Duarte, C. Ratti, and D. Rus, “Roboat III: An autonomous surface vessel for urban transportation,”Journal of Field Robotics, vol. 40, no. 8, pp. 1996– 2009, 2023

1996

[8] [8]

Solgenia-a test vessel toward energy-efficient autonomous water taxi applications,

H. Homburger, S. Wirtensohn, P. Hoher, T. Baur, D. Griesser, M. Diehl, and J. Reuter, “Solgenia-a test vessel toward energy-efficient autonomous water taxi applications,”Ocean Engineering, vol. 328, 6 2025

2025

[9] [9]

Nautilus: An autonomous surface vehicle with a multilayer software architecture for offshore inspection,

D. F. Campos, E. P. Gonc ¸alves, H. J. Campos, M. I. Pereira, and A. M. Pinto, “Nautilus: An autonomous surface vehicle with a multilayer software architecture for offshore inspection,”Journal of Field Robotics, vol. 41, no. 4, pp. 966–990, 2024

2024

[10] [10]

Choreographing the Way of Water: A computational framework for aquatic robotic art,

A. K. Ramachandran Venkatapathy, C. Golling, S. Burmester, N. Sendlhofer, R. Jiang, J. Kamm, and R. D’Andrea, “Choreographing the Way of Water: A computational framework for aquatic robotic art,” inProceedings of the International Conference on New Interfaces for Musical Expression (NIME ’26). London, UK: ACM, Jun. 2026

2026

[11] [11]

T. I. Fossen,Handbook of Marine Craft Hydrodynamics and Motion Control. John Wiley & Sons, Ltd, 2011

2011

[12] [12]

Unmanned surface vehicles: An overview of developments and challenges,

Z. Liu, Y . Zhang, X. Yu, and C. Yuan, “Unmanned surface vehicles: An overview of developments and challenges,”Annual Reviews in Control, vol. 41, pp. 71–93, 2016

2016

[13] [13]

Design, modeling, and nonlinear model predic- tive tracking control of a novel autonomous surface vehicle,

W. Wang, L. A. Mateos, S. Park, P. Leoni, B. Gheneti, F. Duarte, C. Ratti, and D. Rus, “Design, modeling, and nonlinear model predic- tive tracking control of a novel autonomous surface vehicle,” in2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 6189–6196

2018

[14] [14]

Trajectory tracking of a fully-actuated surface vessel using nonlinear model predictive control,

L. M. Kinjo, S. Wirtensohn, J. Reuter, T. Menard, and O. Gehan, “Trajectory tracking of a fully-actuated surface vessel using nonlinear model predictive control,”IF AC-PapersOnLine, vol. 54, no. 16, pp. 51– 56, 2021, 13th IFAC Conference on Control Applications in Marine Systems, Robotics, and Vehicles CAMS 2021

2021

[15] [15]

Deep reinforcement learning-based controller for path following of an unmanned surface vehicle,

J. Woo, C. Yu, and N. Kim, “Deep reinforcement learning-based controller for path following of an unmanned surface vehicle,”Ocean Engineering, vol. 183, pp. 155–166, 2019

2019

[16] [16]

Deep reinforcement learning based tracking control of an autonomous surface vessel in natural waters,

W. Wang, X. Cao, A. Gonzalez-Garcia, L. Yin, N. Hagemann, Y . Qiao, C. Ratti, and D. Rus, “Deep reinforcement learning based tracking control of an autonomous surface vessel in natural waters,” in2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 3109–3115

2023

[17] [17]

Deep re- inforcement learning for path-following control of an autonomous surface vehicle using domain randomization,

T. Slawik, B. Wehbe, L. Christensen, and F. Kirchner, “Deep re- inforcement learning for path-following control of an autonomous surface vehicle using domain randomization,”IF AC-PapersOnLine, vol. 58, no. 20, pp. 21–26, 2024, 15th IFAC Conference on Control Applications in Marine Systems, Robotics and Vehicles CAMS 2024

2024

[18] [18]

Learning quadrupedal locomotion over challenging terrain,

J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,”Science Robotics, vol. 5, no. 47, Oct. 2020

2020

[19] [19]

RMA: Rapid motor adaptation for legged robots,

A. Kumar, Z. Fu, D. Pathak, and J. Malik, “RMA: Rapid motor adaptation for legged robots,” inProceedings of Robotics: Science and Systems, 2021

2021

[20] [20]

Path following optimization for an underactuated USV using smoothly- convergent deep reinforcement learning,

Y . Zhao, X. Qi, Y . Ma, Z. Li, R. Malekian, and M. A. Sotelo, “Path following optimization for an underactuated USV using smoothly- convergent deep reinforcement learning,”IEEE Transactions on Intel- ligent Transportation Systems, vol. 22, no. 10, pp. 6208–6220, 2021

2021

[21] [21]

COLREG-compliant collision avoidance for unmanned surface vehicle using deep reinforce- ment learning,

E. Meyer, A. Heiberg, A. Rasheed, and O. San, “COLREG-compliant collision avoidance for unmanned surface vehicle using deep reinforce- ment learning,”IEEE Access, vol. 8, pp. 165 344–165 364, 2020

2020

[22] [22]

GenLoco: Generalized locomotion controllers for quadrupedal robots,

G. Feng, H. Zhang, Z. Li, X. B. Peng, B. Basireddy, L. Yue, Z. Song, L. Yang, Y . Liu, K. Sreenath, and S. Levine, “GenLoco: Generalized locomotion controllers for quadrupedal robots,” inProceedings of The 6th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 205. PMLR, 2023, pp. 1893–1903

2023

[23] [23]

ManyQuadrupeds: Learn- ing a single locomotion policy for diverse quadruped robots,

M. Shafiee, G. Bellegarda, and A. Ijspeert, “ManyQuadrupeds: Learn- ing a single locomotion policy for diverse quadruped robots,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 3471–3477

2024

[24] [24]

Preparing for the unknown: Learning a universal policy with online system identification,

W. Yu, J. Tan, C. K. Liu, and G. Turk, “Preparing for the unknown: Learning a universal policy with online system identification,” in Proceedings of Robotics: Science and Systems, 2017

2017

[25] [25]

Sim-to- real transfer of robotic control with dynamics randomization,

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- real transfer of robotic control with dynamics randomization,” in2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, May 2018, pp. 3803–3810

2018

[26] [26]

Recurrent model-free RL can be a strong baseline for many POMDPs,

T. Ni, B. Eysenbach, and R. Salakhutdinov, “Recurrent model-free RL can be a strong baseline for many POMDPs,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 16 691–16 723

2022

[27] [27]

Control allocation—a survey,

T. A. Johansen and T. I. Fossen, “Control allocation—a survey,” Automatica, vol. 49, no. 5, pp. 1087–1103, 2013

2013

[28] [28]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[29] [29]

Curriculum learning,

Y . Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” inProceedings of the 26th Annual International Conference on Machine Learning (ICML), 2009, pp. 41–48

2009

[30] [30]

On the theory of the Brownian motion,

G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the Brownian motion,”Phys. Rev., vol. 36, no. 5, pp. 823–841, Sep 1930

1930

[31] [31]

Roboat: An autonomous surface vehicle for urban waterways,

W. Wang, B. Gheneti, L. A. Mateos, F. Duarte, C. Ratti, and D. Rus, “Roboat: An autonomous surface vehicle for urban waterways,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 6340–6347

2019

[32] [32]

Roboat II: A novel autonomous surface vessel for urban environments,

W. Wang, T. Shan, P. Leoni, D. Fern ´andez-Guti´errez, D. Meyers, C. Ratti, and D. Rus, “Roboat II: A novel autonomous surface vessel for urban environments,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 1740–1747

2020