Wind-Aware Reinforcement Learning Control of a Small Quadrotor Using Learned Onboard Wind Estimation in Simulated Atmospheric Turbulence
Pith reviewed 2026-07-03 20:47 UTC · model grok-4.3
The pith
A learned onboard wind estimator lets a reinforcement learning controller cut quadrotor trajectory error by 48 percent in simulated turbulent winds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Training an attention-augmented gated recurrent network on thousands of simulated flights through von Karman turbulence with power-law shear and veer recovers the horizontal wind vector with 0.40 m/s root-mean-square error and 3.2 degrees direction error on unseen regimes; supplying the frozen estimator output to a proximal policy optimization controller then reduces horizontal trajectory tracking error by 48 percent relative to a wind-blind proportional-derivative baseline across 4 m/s to 12 m/s mean winds, with the controller succeeding on 100 percent of episodes and the wind-perception share of the gain increasing with wind speed.
What carries the argument
A two-stage pipeline in which an attention-augmented gated recurrent network first estimates the local wind vector from onboard kinematics and dynamics, after which a proximal policy optimization controller incorporates the frozen estimator output.
If this is right
- The full controller succeeds on 100 percent of evaluation episodes while the wind-blind baseline does not.
- An ablation isolates a kinematic component available without wind information and a wind-perception component whose share rises with wind speed toward half the total benefit.
- The controller degrades gracefully on out-of-distribution winds of 13 m/s to 15 m/s where the baseline fails catastrophically.
- The estimator generalizes to vertical ascent profiles with a skill score of 0.861 relative to a constant-wind reference.
Where Pith is reading between the lines
- If the simulation matches real turbulence closely enough, the same pipeline could be applied to other small UAV platforms that lack dedicated wind sensors.
- Because aerodynamic drag scales quadratically, the relative value of explicit wind perception should increase further for faster vehicles or in gustier conditions.
- The graceful degradation on stronger winds suggests the approach may still provide partial benefit even when the estimator encounters regimes outside its training distribution.
Load-bearing premise
The simulated von Karman turbulence with power-law shear and veer is representative enough of real atmospheric boundary layer conditions that the trained estimator and controller will transfer to physical flights without major retraining or new failure modes.
What would settle it
Real outdoor flight tests that directly compare horizontal trajectory tracking error of the learned controller against the wind-blind baseline under measured atmospheric turbulence at mean winds of 4 m/s to 12 m/s.
read the original abstract
Small multirotor aircraft are increasingly tasked with operations in the atmospheric boundary layer, where turbulent winds comparable to the vehicle's airspeed degrade trajectory tracking and can defeat conventional feedback control. This work illustrates a two-stage learning pipeline that first estimates the local wind from onboard kinematics and dynamics and then exploits that estimate inside a reinforcement learning (RL) flight controller. The wind estimator, an attention-augmented gated recurrent network trained on thousands of simulated flights through von Karman turbulence with power-law shear and veer, recovers the horizontal wind vector with a per-flight root-mean-square error of 0.40 m/s and a direction error of 3.2 degrees on unseen wind regimes, an accuracy near the floor imposed by unresolved turbulence, and generalizes to vertical ascent profiles with a skill score of 0.861 over a constant-wind reference. A proximal policy optimization controller receiving the frozen estimator's output reduces horizontal trajectory tracking error by 48% relative to a wind-blind proportional-derivative baseline across mean winds of 4 m/s to 12 m/s, winning on 100% of evaluation episodes. A three-way ablation decomposes this improvement into a kinematic component, available without wind information, and a wind-perception component; the perception share rises with wind speed, from small in light winds toward roughly half the total benefit in strong winds, consistent with the quadratic scaling of aerodynamic drag. The controller degrades gracefully on out-of-distribution winds of 13 m/s to 15 m/s, where the baseline fails catastrophically.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a two-stage learning pipeline for small quadrotor control in simulated atmospheric turbulence: an attention-augmented gated recurrent network is trained to estimate the horizontal wind vector from onboard kinematics and dynamics (0.40 m/s RMSE, 3.2° direction error on unseen regimes), after which its frozen output is supplied to a proximal policy optimization (PPO) controller. The resulting policy reduces horizontal trajectory tracking error by 48% relative to a wind-blind PD baseline across mean winds of 4–12 m/s, wins on 100% of evaluation episodes, and shows graceful degradation on out-of-distribution winds of 13–15 m/s; a three-way ablation isolates the contribution of the wind-perception component, which grows with wind speed.
Significance. If the reported simulation results hold, the work supplies a concrete, quantitatively supported demonstration that learned onboard wind estimation can be integrated into RL flight control to yield substantial tracking improvements whose magnitude scales with aerodynamic drag, together with an explicit decomposition of kinematic versus perceptual benefits. The provision of RMSE, skill score, ablation percentages, and OOD behavior inside a single consistent von Karman + power-law simulation environment strengthens the internal validity of the central performance claim.
major comments (2)
- [Methods (estimator training)] Methods section on estimator training: the description of the attention-augmented GRU training procedure omits concrete hyperparameter values (learning rate, batch size, sequence length, optimizer settings) and the precise train/validation/test split ratios used for the “thousands of simulated flights,” which directly affects reproducibility of the reported 0.40 m/s RMSE and 0.861 skill score.
- [Evaluation (controller)] Evaluation section on controller performance: the number of held-out episodes, the exact sampling distribution of wind profiles within the 4–12 m/s range, and the precise definition of a “win” episode underlying the 100% win-rate claim are not stated, weakening the statistical grounding of the 48% error-reduction figure and the ablation percentages.
minor comments (1)
- [Results] The definition of the skill score (0.861) relative to the constant-wind reference baseline should be stated explicitly in the main text rather than only in the abstract.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the recommendation for minor revision. We address each major comment below and will incorporate the necessary details into the revised manuscript to improve reproducibility.
read point-by-point responses
-
Referee: [Methods (estimator training)] Methods section on estimator training: the description of the attention-augmented GRU training procedure omits concrete hyperparameter values (learning rate, batch size, sequence length, optimizer settings) and the precise train/validation/test split ratios used for the “thousands of simulated flights,” which directly affects reproducibility of the reported 0.40 m/s RMSE and 0.861 skill score.
Authors: We agree with the referee that these details are crucial for reproducibility. The revised manuscript will include the specific hyperparameter values used for training the attention-augmented GRU (including learning rate, batch size, sequence length, and optimizer settings) as well as the exact train/validation/test split ratios applied to the dataset of simulated flights. revision: yes
-
Referee: [Evaluation (controller)] Evaluation section on controller performance: the number of held-out episodes, the exact sampling distribution of wind profiles within the 4–12 m/s range, and the precise definition of a “win” episode underlying the 100% win-rate claim are not stated, weakening the statistical grounding of the 48% error-reduction figure and the ablation percentages.
Authors: We concur that explicitly stating these evaluation parameters will strengthen the paper. In the revision, we will specify the number of held-out episodes used for testing, detail the sampling procedure for wind profiles in the 4-12 m/s range, and provide the exact definition of a 'win' episode (such as the condition under which the policy is considered to have succeeded in the trajectory tracking task). revision: yes
Circularity Check
No significant circularity
full rationale
The paper's central claims rest on empirical simulation results: the estimator is trained on simulated flights and evaluated on unseen wind regimes with reported RMSE; the PPO controller is trained with the frozen estimator and evaluated on held-out episodes across wind speeds, with a three-way ablation and OOD tests. No equations, self-citations, or derivations are shown that reduce predictions to inputs by construction. All quantitative results (48% error reduction, 100% win rate, skill score) are obtained from direct simulation comparisons without self-definitional or fitted-input circularity.
Axiom & Free-Parameter Ledger
free parameters (2)
- wind estimator network weights
- PPO policy network parameters
axioms (1)
- domain assumption Simulated von Karman turbulence with power-law shear and veer is representative of real atmospheric boundary layer conditions
Reference graph
Works this paper leans on
-
[1]
Real-time wind estimation on a micro unmanned aerial vehicle using its inertial measurement unit,
Neumann, P. P., and Bartholmai, M., “Real-time wind estimation on a micro unmanned aerial vehicle using its inertial measurement unit,”Sensors and Actuators A: Physical, Vol. 235, 2015, pp. 300–310. https://doi.org/10.1016/j.sna.2015.09.036
-
[2]
Sensing Wind from Quadrotor Motion,
González-Rocha, J., Woolsey, C. A., Sultan, C., and De Wekker, S. F. J., “Sensing Wind from Quadrotor Motion,”Journal of Guidance, Control, and Dynamics, Vol. 42, No. 4, 2019, pp. 836–852. https://doi.org/10.2514/1.G003542
-
[3]
Wind Estimation in the Lower Atmosphere Using Multirotor Aircraft,
Palomaki, R. T., Rose, N. T., van den Bossche, M., Sherman, T. J., and De Wekker, S. F. J., “Wind Estimation in the Lower Atmosphere Using Multirotor Aircraft,”Journal of Atmospheric and Oceanic Technology, Vol. 34, No. 5, 2017, pp. 1183–1191. https://doi.org/10.1175/JTECH-D-16-0177.1
-
[4]
Onboard Flow Sensing for Multirotor Pitch Control in Wind,
Yeo, D., Sydney, N., and Paley, D. A., “Onboard Flow Sensing for Multirotor Pitch Control in Wind,”Journal of Guidance, Control, and Dynamics, Vol. 41, No. 5, 2018, pp. 1305–1312. https://doi.org/10.2514/1.G003102
-
[5]
A K Nearest Neighborhood-Based Wind Estimation for Rotary-Wing VTOL UAVs,
Wang, L., Misra, G., and Bai, X., “A K Nearest Neighborhood-Based Wind Estimation for Rotary-Wing VTOL UAVs,”Drones, Vol. 3, No. 2, 2019, p. 31. https://doi.org/10.3390/drones3020031. 28
-
[6]
Neural-Fly enables rapid learning for agile flight in strong winds,
O’Connell, M., Shi, G., Shi, X., Azizzadenesheli, K., Anandkumar, A., Yue, Y., and Chung, S.-J., “Neural-Fly enables rapid learning for agile flight in strong winds,”Science Robotics, Vol. 7, No. 66, 2022, p. eabm6597. https://doi.org/10.1126/ scirobotics.abm6597
work page 2022
-
[7]
Control of a quadrotor with reinforcement learning,
Hwangbo, J., Sa, I., Siegwart, R., and Hutter, M., “Control of a quadrotor with reinforcement learning,”IEEE Robotics and Automation Letters, Vol. 2, No. 4, 2017, pp. 2096–2103
work page 2017
-
[8]
Autonomous Drone Racing with Deep Reinforcement Learning,
Song, Y., Steinweg, M., Kaufmann, E., and Scaramuzza, D., “Autonomous Drone Racing with Deep Reinforcement Learning,”2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2021, pp. 1205–1212. https://doi.org/10.1109/IROS51168.2021.9636053
-
[9]
Champion-level drone racing using deep reinforcement learning,
Kaufmann, E., Bauersfeld, L., Loquercio, A., Müller, M., Koltun, V., and Scaramuzza, D., “Champion-level drone racing using deep reinforcement learning,”Nature, Vol. 620, 2023, pp. 982–987. https://doi.org/10.1038/s41586-023-06419-4
-
[10]
Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning,
Song, Y., Romero, A., Müller, M., Koltun, V., and Scaramuzza, D., “Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning,”Science Robotics, Vol. 8, No. 82, 2023, p. eadg1462. https://doi.org/10.1126/ scirobotics.adg1462
work page 2023
-
[11]
Proximal policy optimization algorithms,
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O., “Proximal policy optimization algorithms,”arXiv preprint,
-
[12]
Stable-Baselines3: Reliable reinforcement learning implementations,
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., and Dormann, N., “Stable-Baselines3: Reliable reinforcement learning implementations,”Journal of Machine Learning Research, Vol. 22, No. 268, 2021, pp. 1–8
work page 2021
-
[13]
DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control,
Huang, K., Rana, R., Spitzer, A., Shi, G., and Boots, B., “DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control,” Proceedings of the 7th Conference on Robot Learning (CoRL), Proceedings of Machine Learning Research, Vol. 229, PMLR, 2023, pp. 326–340
work page 2023
-
[14]
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P., “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,”2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2017, pp. 23–30. https://doi.org/10.1109/IROS.2017.8202133
-
[15]
Hovakimyan, N., and Cao, C.,L1 Adaptive Control Theory: Guaranteed Robustness with Fast Adaptation, SIAM, Philadelphia, PA, 2010
work page 2010
-
[16]
Geometric Adaptive Control With Neural Networks for a Quadrotor in Wind Fields,
Bisheban, M., and Lee, T., “Geometric Adaptive Control With Neural Networks for a Quadrotor in Wind Fields,”IEEE TransactionsonControlSystemsTechnology,Vol.29,No.4,2021,pp.1533–1548. https://doi.org/10.1109/TCST.2020.3006184
-
[17]
NeuroBEM: Hybrid Aerodynamic Quadrotor Model,
Bauersfeld, L., Kaufmann, E., Foehn, P., Sun, S., and Scaramuzza, D., “NeuroBEM: Hybrid Aerodynamic Quadrotor Model,” Robotics: Science and Systems XVII, 2021. https://doi.org/10.15607/RSS.2021.XVII.042. 29
-
[18]
Sun, S., Romero, A., Foehn, P., Kaufmann, E., and Scaramuzza, D., “A Comparative Study of Nonlinear MPC and Differential- Flatness-Based Control for Quadrotor Agile Flight,”IEEE Transactions on Robotics, Vol. 38, No. 6, 2022, pp. 3357–3373. https://doi.org/10.1109/TRO.2022.3177279
-
[19]
Progress in the statistical theory of turbulence,
von Kármán, T., “Progress in the statistical theory of turbulence,”Proceedings of the National Academy of Sciences, Vol. 34, No. 11, 1948, pp. 530–539
work page 1948
-
[20]
Flying Qualities of Piloted Aircraft (MIL-HDBK-1797),
U.S. Department of Defense, “Flying Qualities of Piloted Aircraft (MIL-HDBK-1797),” Department of Defense Handbook,
-
[21]
Defines the von Kármán and Dryden continuous atmospheric turbulence models
-
[22]
Scalable Tools for Generating Synthetic Isotropic Turbulence with Arbitrary Spectra,
Saad, T., Cline, D., Stoll, R., and Sutherland, J. C., “Scalable Tools for Generating Synthetic Isotropic Turbulence with Arbitrary Spectra,”AIAA Journal, Vol. 55, No. 1, 2017, pp. 327–331. https://doi.org/10.2514/1.J055230, URL https: //doi.org/10.2514/1.J055230
-
[23]
B.,An Introduction to Boundary Layer Meteorology, Kluwer Academic Publishers, Dordrecht, 1988
Stull, R. B.,An Introduction to Boundary Layer Meteorology, Kluwer Academic Publishers, Dordrecht, 1988
work page 1988
-
[24]
Learning phrase representations using RNN encoder–decoder for statistical machine translation,
Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y., “Learning phrase representations using RNN encoder–decoder for statistical machine translation,”Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734
work page 2014
-
[25]
Neural machine translation by jointly learning to align and translate,
Bahdanau, D., Cho, K., and Bengio, Y., “Neural machine translation by jointly learning to align and translate,”Proc. Int. Conf. Learning Representations (ICLR), 2015. 30
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.