pith. sign in

arxiv: 2607.01528 · v1 · pith:Q4B435WTnew · submitted 2026-07-01 · 💻 cs.LG · cs.SY· eess.SY

Wind-Aware Reinforcement Learning Control of a Small Quadrotor Using Learned Onboard Wind Estimation in Simulated Atmospheric Turbulence

Pith reviewed 2026-07-03 20:47 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SY
keywords quadrotor controlwind estimationreinforcement learningtrajectory trackingatmospheric turbulenceonboard sensingproximal policy optimizationsimulation training
0
0 comments X

The pith

A learned onboard wind estimator lets a reinforcement learning controller cut quadrotor trajectory error by 48 percent in simulated turbulent winds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a two-stage pipeline first trains an attention-augmented gated recurrent network to estimate local wind from kinematics and dynamics alone, then supplies that estimate to a proximal policy optimization controller. This matters because conventional feedback methods lose tracking performance once turbulent winds approach vehicle airspeed, restricting small multirotors in the atmospheric boundary layer. The estimator recovers the horizontal wind vector to 0.40 m/s root-mean-square error and 3.2 degrees direction error on unseen regimes, and the controller reduces horizontal tracking error by 48 percent relative to a wind-blind proportional-derivative baseline while succeeding on every evaluation episode from 4 m/s to 12 m/s mean wind. The share of improvement coming from explicit wind perception grows with wind speed, reaching roughly half the total gain in stronger flows, consistent with quadratic aerodynamic drag.

Core claim

Training an attention-augmented gated recurrent network on thousands of simulated flights through von Karman turbulence with power-law shear and veer recovers the horizontal wind vector with 0.40 m/s root-mean-square error and 3.2 degrees direction error on unseen regimes; supplying the frozen estimator output to a proximal policy optimization controller then reduces horizontal trajectory tracking error by 48 percent relative to a wind-blind proportional-derivative baseline across 4 m/s to 12 m/s mean winds, with the controller succeeding on 100 percent of episodes and the wind-perception share of the gain increasing with wind speed.

What carries the argument

A two-stage pipeline in which an attention-augmented gated recurrent network first estimates the local wind vector from onboard kinematics and dynamics, after which a proximal policy optimization controller incorporates the frozen estimator output.

If this is right

  • The full controller succeeds on 100 percent of evaluation episodes while the wind-blind baseline does not.
  • An ablation isolates a kinematic component available without wind information and a wind-perception component whose share rises with wind speed toward half the total benefit.
  • The controller degrades gracefully on out-of-distribution winds of 13 m/s to 15 m/s where the baseline fails catastrophically.
  • The estimator generalizes to vertical ascent profiles with a skill score of 0.861 relative to a constant-wind reference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the simulation matches real turbulence closely enough, the same pipeline could be applied to other small UAV platforms that lack dedicated wind sensors.
  • Because aerodynamic drag scales quadratically, the relative value of explicit wind perception should increase further for faster vehicles or in gustier conditions.
  • The graceful degradation on stronger winds suggests the approach may still provide partial benefit even when the estimator encounters regimes outside its training distribution.

Load-bearing premise

The simulated von Karman turbulence with power-law shear and veer is representative enough of real atmospheric boundary layer conditions that the trained estimator and controller will transfer to physical flights without major retraining or new failure modes.

What would settle it

Real outdoor flight tests that directly compare horizontal trajectory tracking error of the learned controller against the wind-blind baseline under measured atmospheric turbulence at mean winds of 4 m/s to 12 m/s.

read the original abstract

Small multirotor aircraft are increasingly tasked with operations in the atmospheric boundary layer, where turbulent winds comparable to the vehicle's airspeed degrade trajectory tracking and can defeat conventional feedback control. This work illustrates a two-stage learning pipeline that first estimates the local wind from onboard kinematics and dynamics and then exploits that estimate inside a reinforcement learning (RL) flight controller. The wind estimator, an attention-augmented gated recurrent network trained on thousands of simulated flights through von Karman turbulence with power-law shear and veer, recovers the horizontal wind vector with a per-flight root-mean-square error of 0.40 m/s and a direction error of 3.2 degrees on unseen wind regimes, an accuracy near the floor imposed by unresolved turbulence, and generalizes to vertical ascent profiles with a skill score of 0.861 over a constant-wind reference. A proximal policy optimization controller receiving the frozen estimator's output reduces horizontal trajectory tracking error by 48% relative to a wind-blind proportional-derivative baseline across mean winds of 4 m/s to 12 m/s, winning on 100% of evaluation episodes. A three-way ablation decomposes this improvement into a kinematic component, available without wind information, and a wind-perception component; the perception share rises with wind speed, from small in light winds toward roughly half the total benefit in strong winds, consistent with the quadratic scaling of aerodynamic drag. The controller degrades gracefully on out-of-distribution winds of 13 m/s to 15 m/s, where the baseline fails catastrophically.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a two-stage learning pipeline for small quadrotor control in simulated atmospheric turbulence: an attention-augmented gated recurrent network is trained to estimate the horizontal wind vector from onboard kinematics and dynamics (0.40 m/s RMSE, 3.2° direction error on unseen regimes), after which its frozen output is supplied to a proximal policy optimization (PPO) controller. The resulting policy reduces horizontal trajectory tracking error by 48% relative to a wind-blind PD baseline across mean winds of 4–12 m/s, wins on 100% of evaluation episodes, and shows graceful degradation on out-of-distribution winds of 13–15 m/s; a three-way ablation isolates the contribution of the wind-perception component, which grows with wind speed.

Significance. If the reported simulation results hold, the work supplies a concrete, quantitatively supported demonstration that learned onboard wind estimation can be integrated into RL flight control to yield substantial tracking improvements whose magnitude scales with aerodynamic drag, together with an explicit decomposition of kinematic versus perceptual benefits. The provision of RMSE, skill score, ablation percentages, and OOD behavior inside a single consistent von Karman + power-law simulation environment strengthens the internal validity of the central performance claim.

major comments (2)
  1. [Methods (estimator training)] Methods section on estimator training: the description of the attention-augmented GRU training procedure omits concrete hyperparameter values (learning rate, batch size, sequence length, optimizer settings) and the precise train/validation/test split ratios used for the “thousands of simulated flights,” which directly affects reproducibility of the reported 0.40 m/s RMSE and 0.861 skill score.
  2. [Evaluation (controller)] Evaluation section on controller performance: the number of held-out episodes, the exact sampling distribution of wind profiles within the 4–12 m/s range, and the precise definition of a “win” episode underlying the 100% win-rate claim are not stated, weakening the statistical grounding of the 48% error-reduction figure and the ablation percentages.
minor comments (1)
  1. [Results] The definition of the skill score (0.861) relative to the constant-wind reference baseline should be stated explicitly in the main text rather than only in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for minor revision. We address each major comment below and will incorporate the necessary details into the revised manuscript to improve reproducibility.

read point-by-point responses
  1. Referee: [Methods (estimator training)] Methods section on estimator training: the description of the attention-augmented GRU training procedure omits concrete hyperparameter values (learning rate, batch size, sequence length, optimizer settings) and the precise train/validation/test split ratios used for the “thousands of simulated flights,” which directly affects reproducibility of the reported 0.40 m/s RMSE and 0.861 skill score.

    Authors: We agree with the referee that these details are crucial for reproducibility. The revised manuscript will include the specific hyperparameter values used for training the attention-augmented GRU (including learning rate, batch size, sequence length, and optimizer settings) as well as the exact train/validation/test split ratios applied to the dataset of simulated flights. revision: yes

  2. Referee: [Evaluation (controller)] Evaluation section on controller performance: the number of held-out episodes, the exact sampling distribution of wind profiles within the 4–12 m/s range, and the precise definition of a “win” episode underlying the 100% win-rate claim are not stated, weakening the statistical grounding of the 48% error-reduction figure and the ablation percentages.

    Authors: We concur that explicitly stating these evaluation parameters will strengthen the paper. In the revision, we will specify the number of held-out episodes used for testing, detail the sampling procedure for wind profiles in the 4-12 m/s range, and provide the exact definition of a 'win' episode (such as the condition under which the policy is considered to have succeeded in the trajectory tracking task). revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claims rest on empirical simulation results: the estimator is trained on simulated flights and evaluated on unseen wind regimes with reported RMSE; the PPO controller is trained with the frozen estimator and evaluated on held-out episodes across wind speeds, with a three-way ablation and OOD tests. No equations, self-citations, or derivations are shown that reduce predictions to inputs by construction. All quantitative results (48% error reduction, 100% win rate, skill score) are obtained from direct simulation comparisons without self-definitional or fitted-input circularity.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the fidelity of the turbulence simulation used for all training and testing; the neural network weights and policy parameters are fitted quantities whose values are not independently derived.

free parameters (2)
  • wind estimator network weights
    Trained end-to-end on thousands of simulated flight trajectories through von Karman turbulence
  • PPO policy network parameters
    Learned via reinforcement learning episodes that use the frozen estimator output as input
axioms (1)
  • domain assumption Simulated von Karman turbulence with power-law shear and veer is representative of real atmospheric boundary layer conditions
    All training data and evaluation episodes are generated from this model

pith-pipeline@v0.9.1-grok · 5821 in / 1469 out tokens · 28709 ms · 2026-07-03T20:47:04.366901+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    Real-time wind estimation on a micro unmanned aerial vehicle using its inertial measurement unit,

    Neumann, P. P., and Bartholmai, M., “Real-time wind estimation on a micro unmanned aerial vehicle using its inertial measurement unit,”Sensors and Actuators A: Physical, Vol. 235, 2015, pp. 300–310. https://doi.org/10.1016/j.sna.2015.09.036

  2. [2]

    Sensing Wind from Quadrotor Motion,

    González-Rocha, J., Woolsey, C. A., Sultan, C., and De Wekker, S. F. J., “Sensing Wind from Quadrotor Motion,”Journal of Guidance, Control, and Dynamics, Vol. 42, No. 4, 2019, pp. 836–852. https://doi.org/10.2514/1.G003542

  3. [3]

    Wind Estimation in the Lower Atmosphere Using Multirotor Aircraft,

    Palomaki, R. T., Rose, N. T., van den Bossche, M., Sherman, T. J., and De Wekker, S. F. J., “Wind Estimation in the Lower Atmosphere Using Multirotor Aircraft,”Journal of Atmospheric and Oceanic Technology, Vol. 34, No. 5, 2017, pp. 1183–1191. https://doi.org/10.1175/JTECH-D-16-0177.1

  4. [4]

    Onboard Flow Sensing for Multirotor Pitch Control in Wind,

    Yeo, D., Sydney, N., and Paley, D. A., “Onboard Flow Sensing for Multirotor Pitch Control in Wind,”Journal of Guidance, Control, and Dynamics, Vol. 41, No. 5, 2018, pp. 1305–1312. https://doi.org/10.2514/1.G003102

  5. [5]

    A K Nearest Neighborhood-Based Wind Estimation for Rotary-Wing VTOL UAVs,

    Wang, L., Misra, G., and Bai, X., “A K Nearest Neighborhood-Based Wind Estimation for Rotary-Wing VTOL UAVs,”Drones, Vol. 3, No. 2, 2019, p. 31. https://doi.org/10.3390/drones3020031. 28

  6. [6]

    Neural-Fly enables rapid learning for agile flight in strong winds,

    O’Connell, M., Shi, G., Shi, X., Azizzadenesheli, K., Anandkumar, A., Yue, Y., and Chung, S.-J., “Neural-Fly enables rapid learning for agile flight in strong winds,”Science Robotics, Vol. 7, No. 66, 2022, p. eabm6597. https://doi.org/10.1126/ scirobotics.abm6597

  7. [7]

    Control of a quadrotor with reinforcement learning,

    Hwangbo, J., Sa, I., Siegwart, R., and Hutter, M., “Control of a quadrotor with reinforcement learning,”IEEE Robotics and Automation Letters, Vol. 2, No. 4, 2017, pp. 2096–2103

  8. [8]

    Autonomous Drone Racing with Deep Reinforcement Learning,

    Song, Y., Steinweg, M., Kaufmann, E., and Scaramuzza, D., “Autonomous Drone Racing with Deep Reinforcement Learning,”2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2021, pp. 1205–1212. https://doi.org/10.1109/IROS51168.2021.9636053

  9. [9]

    Champion-level drone racing using deep reinforcement learning,

    Kaufmann, E., Bauersfeld, L., Loquercio, A., Müller, M., Koltun, V., and Scaramuzza, D., “Champion-level drone racing using deep reinforcement learning,”Nature, Vol. 620, 2023, pp. 982–987. https://doi.org/10.1038/s41586-023-06419-4

  10. [10]

    Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning,

    Song, Y., Romero, A., Müller, M., Koltun, V., and Scaramuzza, D., “Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning,”Science Robotics, Vol. 8, No. 82, 2023, p. eadg1462. https://doi.org/10.1126/ scirobotics.adg1462

  11. [11]

    Proximal policy optimization algorithms,

    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O., “Proximal policy optimization algorithms,”arXiv preprint,

  12. [12]

    Stable-Baselines3: Reliable reinforcement learning implementations,

    Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., and Dormann, N., “Stable-Baselines3: Reliable reinforcement learning implementations,”Journal of Machine Learning Research, Vol. 22, No. 268, 2021, pp. 1–8

  13. [13]

    DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control,

    Huang, K., Rana, R., Spitzer, A., Shi, G., and Boots, B., “DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control,” Proceedings of the 7th Conference on Robot Learning (CoRL), Proceedings of Machine Learning Research, Vol. 229, PMLR, 2023, pp. 326–340

  14. [14]

    Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,

    Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P., “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,”2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2017, pp. 23–30. https://doi.org/10.1109/IROS.2017.8202133

  15. [15]

    Hovakimyan, N., and Cao, C.,L1 Adaptive Control Theory: Guaranteed Robustness with Fast Adaptation, SIAM, Philadelphia, PA, 2010

  16. [16]

    Geometric Adaptive Control With Neural Networks for a Quadrotor in Wind Fields,

    Bisheban, M., and Lee, T., “Geometric Adaptive Control With Neural Networks for a Quadrotor in Wind Fields,”IEEE TransactionsonControlSystemsTechnology,Vol.29,No.4,2021,pp.1533–1548. https://doi.org/10.1109/TCST.2020.3006184

  17. [17]

    NeuroBEM: Hybrid Aerodynamic Quadrotor Model,

    Bauersfeld, L., Kaufmann, E., Foehn, P., Sun, S., and Scaramuzza, D., “NeuroBEM: Hybrid Aerodynamic Quadrotor Model,” Robotics: Science and Systems XVII, 2021. https://doi.org/10.15607/RSS.2021.XVII.042. 29

  18. [18]

    A Comparative Study of Nonlinear MPC and Differential- Flatness-Based Control for Quadrotor Agile Flight,

    Sun, S., Romero, A., Foehn, P., Kaufmann, E., and Scaramuzza, D., “A Comparative Study of Nonlinear MPC and Differential- Flatness-Based Control for Quadrotor Agile Flight,”IEEE Transactions on Robotics, Vol. 38, No. 6, 2022, pp. 3357–3373. https://doi.org/10.1109/TRO.2022.3177279

  19. [19]

    Progress in the statistical theory of turbulence,

    von Kármán, T., “Progress in the statistical theory of turbulence,”Proceedings of the National Academy of Sciences, Vol. 34, No. 11, 1948, pp. 530–539

  20. [20]

    Flying Qualities of Piloted Aircraft (MIL-HDBK-1797),

    U.S. Department of Defense, “Flying Qualities of Piloted Aircraft (MIL-HDBK-1797),” Department of Defense Handbook,

  21. [21]

    Defines the von Kármán and Dryden continuous atmospheric turbulence models

  22. [22]

    Scalable Tools for Generating Synthetic Isotropic Turbulence with Arbitrary Spectra,

    Saad, T., Cline, D., Stoll, R., and Sutherland, J. C., “Scalable Tools for Generating Synthetic Isotropic Turbulence with Arbitrary Spectra,”AIAA Journal, Vol. 55, No. 1, 2017, pp. 327–331. https://doi.org/10.2514/1.J055230, URL https: //doi.org/10.2514/1.J055230

  23. [23]

    B.,An Introduction to Boundary Layer Meteorology, Kluwer Academic Publishers, Dordrecht, 1988

    Stull, R. B.,An Introduction to Boundary Layer Meteorology, Kluwer Academic Publishers, Dordrecht, 1988

  24. [24]

    Learning phrase representations using RNN encoder–decoder for statistical machine translation,

    Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y., “Learning phrase representations using RNN encoder–decoder for statistical machine translation,”Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734

  25. [25]

    Neural machine translation by jointly learning to align and translate,

    Bahdanau, D., Cho, K., and Bengio, Y., “Neural machine translation by jointly learning to align and translate,”Proc. Int. Conf. Learning Representations (ICLR), 2015. 30