Wind-Aware Reinforcement Learning Control of a Small Quadrotor Using Learned Onboard Wind Estimation in Simulated Atmospheric Turbulence

Abdullah Al Tasim; Wei Sun

arxiv: 2607.01528 · v1 · pith:Q4B435WTnew · submitted 2026-07-01 · 💻 cs.LG · cs.SY· eess.SY

Wind-Aware Reinforcement Learning Control of a Small Quadrotor Using Learned Onboard Wind Estimation in Simulated Atmospheric Turbulence

Abdullah Al Tasim , Wei Sun This is my paper

Pith reviewed 2026-07-03 20:47 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SY

keywords quadrotor controlwind estimationreinforcement learningtrajectory trackingatmospheric turbulenceonboard sensingproximal policy optimizationsimulation training

0 comments

The pith

A learned onboard wind estimator lets a reinforcement learning controller cut quadrotor trajectory error by 48 percent in simulated turbulent winds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a two-stage pipeline first trains an attention-augmented gated recurrent network to estimate local wind from kinematics and dynamics alone, then supplies that estimate to a proximal policy optimization controller. This matters because conventional feedback methods lose tracking performance once turbulent winds approach vehicle airspeed, restricting small multirotors in the atmospheric boundary layer. The estimator recovers the horizontal wind vector to 0.40 m/s root-mean-square error and 3.2 degrees direction error on unseen regimes, and the controller reduces horizontal tracking error by 48 percent relative to a wind-blind proportional-derivative baseline while succeeding on every evaluation episode from 4 m/s to 12 m/s mean wind. The share of improvement coming from explicit wind perception grows with wind speed, reaching roughly half the total gain in stronger flows, consistent with quadratic aerodynamic drag.

Core claim

Training an attention-augmented gated recurrent network on thousands of simulated flights through von Karman turbulence with power-law shear and veer recovers the horizontal wind vector with 0.40 m/s root-mean-square error and 3.2 degrees direction error on unseen regimes; supplying the frozen estimator output to a proximal policy optimization controller then reduces horizontal trajectory tracking error by 48 percent relative to a wind-blind proportional-derivative baseline across 4 m/s to 12 m/s mean winds, with the controller succeeding on 100 percent of episodes and the wind-perception share of the gain increasing with wind speed.

What carries the argument

A two-stage pipeline in which an attention-augmented gated recurrent network first estimates the local wind vector from onboard kinematics and dynamics, after which a proximal policy optimization controller incorporates the frozen estimator output.

If this is right

The full controller succeeds on 100 percent of evaluation episodes while the wind-blind baseline does not.
An ablation isolates a kinematic component available without wind information and a wind-perception component whose share rises with wind speed toward half the total benefit.
The controller degrades gracefully on out-of-distribution winds of 13 m/s to 15 m/s where the baseline fails catastrophically.
The estimator generalizes to vertical ascent profiles with a skill score of 0.861 relative to a constant-wind reference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the simulation matches real turbulence closely enough, the same pipeline could be applied to other small UAV platforms that lack dedicated wind sensors.
Because aerodynamic drag scales quadratically, the relative value of explicit wind perception should increase further for faster vehicles or in gustier conditions.
The graceful degradation on stronger winds suggests the approach may still provide partial benefit even when the estimator encounters regimes outside its training distribution.

Load-bearing premise

The simulated von Karman turbulence with power-law shear and veer is representative enough of real atmospheric boundary layer conditions that the trained estimator and controller will transfer to physical flights without major retraining or new failure modes.

What would settle it

Real outdoor flight tests that directly compare horizontal trajectory tracking error of the learned controller against the wind-blind baseline under measured atmospheric turbulence at mean winds of 4 m/s to 12 m/s.

read the original abstract

Small multirotor aircraft are increasingly tasked with operations in the atmospheric boundary layer, where turbulent winds comparable to the vehicle's airspeed degrade trajectory tracking and can defeat conventional feedback control. This work illustrates a two-stage learning pipeline that first estimates the local wind from onboard kinematics and dynamics and then exploits that estimate inside a reinforcement learning (RL) flight controller. The wind estimator, an attention-augmented gated recurrent network trained on thousands of simulated flights through von Karman turbulence with power-law shear and veer, recovers the horizontal wind vector with a per-flight root-mean-square error of 0.40 m/s and a direction error of 3.2 degrees on unseen wind regimes, an accuracy near the floor imposed by unresolved turbulence, and generalizes to vertical ascent profiles with a skill score of 0.861 over a constant-wind reference. A proximal policy optimization controller receiving the frozen estimator's output reduces horizontal trajectory tracking error by 48% relative to a wind-blind proportional-derivative baseline across mean winds of 4 m/s to 12 m/s, winning on 100% of evaluation episodes. A three-way ablation decomposes this improvement into a kinematic component, available without wind information, and a wind-perception component; the perception share rises with wind speed, from small in light winds toward roughly half the total benefit in strong winds, consistent with the quadratic scaling of aerodynamic drag. The controller degrades gracefully on out-of-distribution winds of 13 m/s to 15 m/s, where the baseline fails catastrophically.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a two-stage sim pipeline where an attention-GRU wind estimator from kinematics feeds a PPO controller and cuts tracking error 48% versus PD baseline, with useful ablations, but everything stays in von Karman turbulence.

read the letter

The main takeaway is that this work gives a working example of training an attention-augmented GRU on simulated quadrotor flights through von Karman turbulence to estimate horizontal wind from kinematics alone, then freezing that output as an extra input to a PPO policy. The controller then beats a wind-blind PD baseline by 48% on horizontal tracking error and wins every evaluation episode from 4 to 12 m/s mean wind.

The concrete numbers and the three-way ablation stand out: they separate the benefit that comes just from the vehicle dynamics versus the extra gain from the learned wind perception, and they show the perception share increasing with wind speed as expected from drag scaling. The OOD test up to 15 m/s where the baseline collapses is also straightforward to follow. The estimator itself reaches 0.40 m/s RMSE and 3.2 degrees direction error on held-out regimes, which is close to the turbulence floor they cite.

The obvious limit is that all results are simulation only. The von Karman model plus power-law shear and veer is a standard choice, but it is still an approximation, and there is no hardware validation or discussion of sensor noise, actuator limits, or model mismatch that would appear on a real small quadrotor. The abstract also flags missing training details and exact hyper-parameters, which would slow down anyone wanting to reproduce the numbers.

This is aimed at groups already running RL on multirotors and looking for ways to handle wind without extra sensors. The internal logic holds together and the ablations are honest, so the central sim claim is defensible even if the transfer story is still open.

I would bring the methods section to a reading group to talk through the estimator architecture and the ablation design. I would not cite it yet because it is still simulation-only. It deserves peer review because the quantitative claims are specific, the comparisons are clear, and the gaps are easy to state rather than hidden.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a two-stage learning pipeline for small quadrotor control in simulated atmospheric turbulence: an attention-augmented gated recurrent network is trained to estimate the horizontal wind vector from onboard kinematics and dynamics (0.40 m/s RMSE, 3.2° direction error on unseen regimes), after which its frozen output is supplied to a proximal policy optimization (PPO) controller. The resulting policy reduces horizontal trajectory tracking error by 48% relative to a wind-blind PD baseline across mean winds of 4–12 m/s, wins on 100% of evaluation episodes, and shows graceful degradation on out-of-distribution winds of 13–15 m/s; a three-way ablation isolates the contribution of the wind-perception component, which grows with wind speed.

Significance. If the reported simulation results hold, the work supplies a concrete, quantitatively supported demonstration that learned onboard wind estimation can be integrated into RL flight control to yield substantial tracking improvements whose magnitude scales with aerodynamic drag, together with an explicit decomposition of kinematic versus perceptual benefits. The provision of RMSE, skill score, ablation percentages, and OOD behavior inside a single consistent von Karman + power-law simulation environment strengthens the internal validity of the central performance claim.

major comments (2)

[Methods (estimator training)] Methods section on estimator training: the description of the attention-augmented GRU training procedure omits concrete hyperparameter values (learning rate, batch size, sequence length, optimizer settings) and the precise train/validation/test split ratios used for the “thousands of simulated flights,” which directly affects reproducibility of the reported 0.40 m/s RMSE and 0.861 skill score.
[Evaluation (controller)] Evaluation section on controller performance: the number of held-out episodes, the exact sampling distribution of wind profiles within the 4–12 m/s range, and the precise definition of a “win” episode underlying the 100% win-rate claim are not stated, weakening the statistical grounding of the 48% error-reduction figure and the ablation percentages.

minor comments (1)

[Results] The definition of the skill score (0.861) relative to the constant-wind reference baseline should be stated explicitly in the main text rather than only in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for minor revision. We address each major comment below and will incorporate the necessary details into the revised manuscript to improve reproducibility.

read point-by-point responses

Referee: [Methods (estimator training)] Methods section on estimator training: the description of the attention-augmented GRU training procedure omits concrete hyperparameter values (learning rate, batch size, sequence length, optimizer settings) and the precise train/validation/test split ratios used for the “thousands of simulated flights,” which directly affects reproducibility of the reported 0.40 m/s RMSE and 0.861 skill score.

Authors: We agree with the referee that these details are crucial for reproducibility. The revised manuscript will include the specific hyperparameter values used for training the attention-augmented GRU (including learning rate, batch size, sequence length, and optimizer settings) as well as the exact train/validation/test split ratios applied to the dataset of simulated flights. revision: yes
Referee: [Evaluation (controller)] Evaluation section on controller performance: the number of held-out episodes, the exact sampling distribution of wind profiles within the 4–12 m/s range, and the precise definition of a “win” episode underlying the 100% win-rate claim are not stated, weakening the statistical grounding of the 48% error-reduction figure and the ablation percentages.

Authors: We concur that explicitly stating these evaluation parameters will strengthen the paper. In the revision, we will specify the number of held-out episodes used for testing, detail the sampling procedure for wind profiles in the 4-12 m/s range, and provide the exact definition of a 'win' episode (such as the condition under which the policy is considered to have succeeded in the trajectory tracking task). revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claims rest on empirical simulation results: the estimator is trained on simulated flights and evaluated on unseen wind regimes with reported RMSE; the PPO controller is trained with the frozen estimator and evaluated on held-out episodes across wind speeds, with a three-way ablation and OOD tests. No equations, self-citations, or derivations are shown that reduce predictions to inputs by construction. All quantitative results (48% error reduction, 100% win rate, skill score) are obtained from direct simulation comparisons without self-definitional or fitted-input circularity.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the fidelity of the turbulence simulation used for all training and testing; the neural network weights and policy parameters are fitted quantities whose values are not independently derived.

free parameters (2)

wind estimator network weights
Trained end-to-end on thousands of simulated flight trajectories through von Karman turbulence
PPO policy network parameters
Learned via reinforcement learning episodes that use the frozen estimator output as input

axioms (1)

domain assumption Simulated von Karman turbulence with power-law shear and veer is representative of real atmospheric boundary layer conditions
All training data and evaluation episodes are generated from this model

pith-pipeline@v0.9.1-grok · 5821 in / 1469 out tokens · 28709 ms · 2026-07-03T20:47:04.366901+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

[1]

Real-time wind estimation on a micro unmanned aerial vehicle using its inertial measurement unit,

Neumann, P. P., and Bartholmai, M., “Real-time wind estimation on a micro unmanned aerial vehicle using its inertial measurement unit,”Sensors and Actuators A: Physical, Vol. 235, 2015, pp. 300–310. https://doi.org/10.1016/j.sna.2015.09.036

work page doi:10.1016/j.sna.2015.09.036 2015
[2]

Sensing Wind from Quadrotor Motion,

González-Rocha, J., Woolsey, C. A., Sultan, C., and De Wekker, S. F. J., “Sensing Wind from Quadrotor Motion,”Journal of Guidance, Control, and Dynamics, Vol. 42, No. 4, 2019, pp. 836–852. https://doi.org/10.2514/1.G003542

work page doi:10.2514/1.g003542 2019
[3]

Wind Estimation in the Lower Atmosphere Using Multirotor Aircraft,

Palomaki, R. T., Rose, N. T., van den Bossche, M., Sherman, T. J., and De Wekker, S. F. J., “Wind Estimation in the Lower Atmosphere Using Multirotor Aircraft,”Journal of Atmospheric and Oceanic Technology, Vol. 34, No. 5, 2017, pp. 1183–1191. https://doi.org/10.1175/JTECH-D-16-0177.1

work page doi:10.1175/jtech-d-16-0177.1 2017
[4]

Onboard Flow Sensing for Multirotor Pitch Control in Wind,

Yeo, D., Sydney, N., and Paley, D. A., “Onboard Flow Sensing for Multirotor Pitch Control in Wind,”Journal of Guidance, Control, and Dynamics, Vol. 41, No. 5, 2018, pp. 1305–1312. https://doi.org/10.2514/1.G003102

work page doi:10.2514/1.g003102 2018
[5]

A K Nearest Neighborhood-Based Wind Estimation for Rotary-Wing VTOL UAVs,

Wang, L., Misra, G., and Bai, X., “A K Nearest Neighborhood-Based Wind Estimation for Rotary-Wing VTOL UAVs,”Drones, Vol. 3, No. 2, 2019, p. 31. https://doi.org/10.3390/drones3020031. 28

work page doi:10.3390/drones3020031 2019
[6]

Neural-Fly enables rapid learning for agile flight in strong winds,

O’Connell, M., Shi, G., Shi, X., Azizzadenesheli, K., Anandkumar, A., Yue, Y., and Chung, S.-J., “Neural-Fly enables rapid learning for agile flight in strong winds,”Science Robotics, Vol. 7, No. 66, 2022, p. eabm6597. https://doi.org/10.1126/ scirobotics.abm6597

work page 2022
[7]

Control of a quadrotor with reinforcement learning,

Hwangbo, J., Sa, I., Siegwart, R., and Hutter, M., “Control of a quadrotor with reinforcement learning,”IEEE Robotics and Automation Letters, Vol. 2, No. 4, 2017, pp. 2096–2103

work page 2017
[8]

Autonomous Drone Racing with Deep Reinforcement Learning,

Song, Y., Steinweg, M., Kaufmann, E., and Scaramuzza, D., “Autonomous Drone Racing with Deep Reinforcement Learning,”2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2021, pp. 1205–1212. https://doi.org/10.1109/IROS51168.2021.9636053

work page doi:10.1109/iros51168.2021.9636053 2021
[9]

Champion-level drone racing using deep reinforcement learning,

Kaufmann, E., Bauersfeld, L., Loquercio, A., Müller, M., Koltun, V., and Scaramuzza, D., “Champion-level drone racing using deep reinforcement learning,”Nature, Vol. 620, 2023, pp. 982–987. https://doi.org/10.1038/s41586-023-06419-4

work page doi:10.1038/s41586-023-06419-4 2023
[10]

Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning,

Song, Y., Romero, A., Müller, M., Koltun, V., and Scaramuzza, D., “Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning,”Science Robotics, Vol. 8, No. 82, 2023, p. eadg1462. https://doi.org/10.1126/ scirobotics.adg1462

work page 2023
[11]

Proximal policy optimization algorithms,

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O., “Proximal policy optimization algorithms,”arXiv preprint,

work page
[12]

Stable-Baselines3: Reliable reinforcement learning implementations,

Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., and Dormann, N., “Stable-Baselines3: Reliable reinforcement learning implementations,”Journal of Machine Learning Research, Vol. 22, No. 268, 2021, pp. 1–8

work page 2021
[13]

DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control,

Huang, K., Rana, R., Spitzer, A., Shi, G., and Boots, B., “DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control,” Proceedings of the 7th Conference on Robot Learning (CoRL), Proceedings of Machine Learning Research, Vol. 229, PMLR, 2023, pp. 326–340

work page 2023
[14]

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,

Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P., “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,”2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2017, pp. 23–30. https://doi.org/10.1109/IROS.2017.8202133

work page doi:10.1109/iros.2017.8202133 2017
[15]

Hovakimyan, N., and Cao, C.,L1 Adaptive Control Theory: Guaranteed Robustness with Fast Adaptation, SIAM, Philadelphia, PA, 2010

work page 2010
[16]

Geometric Adaptive Control With Neural Networks for a Quadrotor in Wind Fields,

Bisheban, M., and Lee, T., “Geometric Adaptive Control With Neural Networks for a Quadrotor in Wind Fields,”IEEE TransactionsonControlSystemsTechnology,Vol.29,No.4,2021,pp.1533–1548. https://doi.org/10.1109/TCST.2020.3006184

work page doi:10.1109/tcst.2020.3006184 2021
[17]

NeuroBEM: Hybrid Aerodynamic Quadrotor Model,

Bauersfeld, L., Kaufmann, E., Foehn, P., Sun, S., and Scaramuzza, D., “NeuroBEM: Hybrid Aerodynamic Quadrotor Model,” Robotics: Science and Systems XVII, 2021. https://doi.org/10.15607/RSS.2021.XVII.042. 29

work page doi:10.15607/rss.2021.xvii.042 2021
[18]

A Comparative Study of Nonlinear MPC and Differential- Flatness-Based Control for Quadrotor Agile Flight,

Sun, S., Romero, A., Foehn, P., Kaufmann, E., and Scaramuzza, D., “A Comparative Study of Nonlinear MPC and Differential- Flatness-Based Control for Quadrotor Agile Flight,”IEEE Transactions on Robotics, Vol. 38, No. 6, 2022, pp. 3357–3373. https://doi.org/10.1109/TRO.2022.3177279

work page doi:10.1109/tro.2022.3177279 2022
[19]

Progress in the statistical theory of turbulence,

von Kármán, T., “Progress in the statistical theory of turbulence,”Proceedings of the National Academy of Sciences, Vol. 34, No. 11, 1948, pp. 530–539

work page 1948
[20]

Flying Qualities of Piloted Aircraft (MIL-HDBK-1797),

U.S. Department of Defense, “Flying Qualities of Piloted Aircraft (MIL-HDBK-1797),” Department of Defense Handbook,

work page
[21]

Defines the von Kármán and Dryden continuous atmospheric turbulence models

work page
[22]

Scalable Tools for Generating Synthetic Isotropic Turbulence with Arbitrary Spectra,

Saad, T., Cline, D., Stoll, R., and Sutherland, J. C., “Scalable Tools for Generating Synthetic Isotropic Turbulence with Arbitrary Spectra,”AIAA Journal, Vol. 55, No. 1, 2017, pp. 327–331. https://doi.org/10.2514/1.J055230, URL https: //doi.org/10.2514/1.J055230

work page doi:10.2514/1.j055230 2017
[23]

B.,An Introduction to Boundary Layer Meteorology, Kluwer Academic Publishers, Dordrecht, 1988

Stull, R. B.,An Introduction to Boundary Layer Meteorology, Kluwer Academic Publishers, Dordrecht, 1988

work page 1988
[24]

Learning phrase representations using RNN encoder–decoder for statistical machine translation,

Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y., “Learning phrase representations using RNN encoder–decoder for statistical machine translation,”Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734

work page 2014
[25]

Neural machine translation by jointly learning to align and translate,

Bahdanau, D., Cho, K., and Bengio, Y., “Neural machine translation by jointly learning to align and translate,”Proc. Int. Conf. Learning Representations (ICLR), 2015. 30

work page 2015

[1] [1]

Real-time wind estimation on a micro unmanned aerial vehicle using its inertial measurement unit,

Neumann, P. P., and Bartholmai, M., “Real-time wind estimation on a micro unmanned aerial vehicle using its inertial measurement unit,”Sensors and Actuators A: Physical, Vol. 235, 2015, pp. 300–310. https://doi.org/10.1016/j.sna.2015.09.036

work page doi:10.1016/j.sna.2015.09.036 2015

[2] [2]

Sensing Wind from Quadrotor Motion,

González-Rocha, J., Woolsey, C. A., Sultan, C., and De Wekker, S. F. J., “Sensing Wind from Quadrotor Motion,”Journal of Guidance, Control, and Dynamics, Vol. 42, No. 4, 2019, pp. 836–852. https://doi.org/10.2514/1.G003542

work page doi:10.2514/1.g003542 2019

[3] [3]

Wind Estimation in the Lower Atmosphere Using Multirotor Aircraft,

Palomaki, R. T., Rose, N. T., van den Bossche, M., Sherman, T. J., and De Wekker, S. F. J., “Wind Estimation in the Lower Atmosphere Using Multirotor Aircraft,”Journal of Atmospheric and Oceanic Technology, Vol. 34, No. 5, 2017, pp. 1183–1191. https://doi.org/10.1175/JTECH-D-16-0177.1

work page doi:10.1175/jtech-d-16-0177.1 2017

[4] [4]

Onboard Flow Sensing for Multirotor Pitch Control in Wind,

Yeo, D., Sydney, N., and Paley, D. A., “Onboard Flow Sensing for Multirotor Pitch Control in Wind,”Journal of Guidance, Control, and Dynamics, Vol. 41, No. 5, 2018, pp. 1305–1312. https://doi.org/10.2514/1.G003102

work page doi:10.2514/1.g003102 2018

[5] [5]

A K Nearest Neighborhood-Based Wind Estimation for Rotary-Wing VTOL UAVs,

Wang, L., Misra, G., and Bai, X., “A K Nearest Neighborhood-Based Wind Estimation for Rotary-Wing VTOL UAVs,”Drones, Vol. 3, No. 2, 2019, p. 31. https://doi.org/10.3390/drones3020031. 28

work page doi:10.3390/drones3020031 2019

[6] [6]

Neural-Fly enables rapid learning for agile flight in strong winds,

O’Connell, M., Shi, G., Shi, X., Azizzadenesheli, K., Anandkumar, A., Yue, Y., and Chung, S.-J., “Neural-Fly enables rapid learning for agile flight in strong winds,”Science Robotics, Vol. 7, No. 66, 2022, p. eabm6597. https://doi.org/10.1126/ scirobotics.abm6597

work page 2022

[7] [7]

Control of a quadrotor with reinforcement learning,

Hwangbo, J., Sa, I., Siegwart, R., and Hutter, M., “Control of a quadrotor with reinforcement learning,”IEEE Robotics and Automation Letters, Vol. 2, No. 4, 2017, pp. 2096–2103

work page 2017

[8] [8]

Autonomous Drone Racing with Deep Reinforcement Learning,

Song, Y., Steinweg, M., Kaufmann, E., and Scaramuzza, D., “Autonomous Drone Racing with Deep Reinforcement Learning,”2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2021, pp. 1205–1212. https://doi.org/10.1109/IROS51168.2021.9636053

work page doi:10.1109/iros51168.2021.9636053 2021

[9] [9]

Champion-level drone racing using deep reinforcement learning,

Kaufmann, E., Bauersfeld, L., Loquercio, A., Müller, M., Koltun, V., and Scaramuzza, D., “Champion-level drone racing using deep reinforcement learning,”Nature, Vol. 620, 2023, pp. 982–987. https://doi.org/10.1038/s41586-023-06419-4

work page doi:10.1038/s41586-023-06419-4 2023

[10] [10]

Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning,

Song, Y., Romero, A., Müller, M., Koltun, V., and Scaramuzza, D., “Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning,”Science Robotics, Vol. 8, No. 82, 2023, p. eadg1462. https://doi.org/10.1126/ scirobotics.adg1462

work page 2023

[11] [11]

Proximal policy optimization algorithms,

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O., “Proximal policy optimization algorithms,”arXiv preprint,

work page

[12] [12]

Stable-Baselines3: Reliable reinforcement learning implementations,

Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., and Dormann, N., “Stable-Baselines3: Reliable reinforcement learning implementations,”Journal of Machine Learning Research, Vol. 22, No. 268, 2021, pp. 1–8

work page 2021

[13] [13]

DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control,

Huang, K., Rana, R., Spitzer, A., Shi, G., and Boots, B., “DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control,” Proceedings of the 7th Conference on Robot Learning (CoRL), Proceedings of Machine Learning Research, Vol. 229, PMLR, 2023, pp. 326–340

work page 2023

[14] [14]

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,

Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P., “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,”2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2017, pp. 23–30. https://doi.org/10.1109/IROS.2017.8202133

work page doi:10.1109/iros.2017.8202133 2017

[15] [15]

Hovakimyan, N., and Cao, C.,L1 Adaptive Control Theory: Guaranteed Robustness with Fast Adaptation, SIAM, Philadelphia, PA, 2010

work page 2010

[16] [16]

Geometric Adaptive Control With Neural Networks for a Quadrotor in Wind Fields,

Bisheban, M., and Lee, T., “Geometric Adaptive Control With Neural Networks for a Quadrotor in Wind Fields,”IEEE TransactionsonControlSystemsTechnology,Vol.29,No.4,2021,pp.1533–1548. https://doi.org/10.1109/TCST.2020.3006184

work page doi:10.1109/tcst.2020.3006184 2021

[17] [17]

NeuroBEM: Hybrid Aerodynamic Quadrotor Model,

Bauersfeld, L., Kaufmann, E., Foehn, P., Sun, S., and Scaramuzza, D., “NeuroBEM: Hybrid Aerodynamic Quadrotor Model,” Robotics: Science and Systems XVII, 2021. https://doi.org/10.15607/RSS.2021.XVII.042. 29

work page doi:10.15607/rss.2021.xvii.042 2021

[18] [18]

A Comparative Study of Nonlinear MPC and Differential- Flatness-Based Control for Quadrotor Agile Flight,

Sun, S., Romero, A., Foehn, P., Kaufmann, E., and Scaramuzza, D., “A Comparative Study of Nonlinear MPC and Differential- Flatness-Based Control for Quadrotor Agile Flight,”IEEE Transactions on Robotics, Vol. 38, No. 6, 2022, pp. 3357–3373. https://doi.org/10.1109/TRO.2022.3177279

work page doi:10.1109/tro.2022.3177279 2022

[19] [19]

Progress in the statistical theory of turbulence,

von Kármán, T., “Progress in the statistical theory of turbulence,”Proceedings of the National Academy of Sciences, Vol. 34, No. 11, 1948, pp. 530–539

work page 1948

[20] [20]

Flying Qualities of Piloted Aircraft (MIL-HDBK-1797),

U.S. Department of Defense, “Flying Qualities of Piloted Aircraft (MIL-HDBK-1797),” Department of Defense Handbook,

work page

[21] [21]

Defines the von Kármán and Dryden continuous atmospheric turbulence models

work page

[22] [22]

Scalable Tools for Generating Synthetic Isotropic Turbulence with Arbitrary Spectra,

Saad, T., Cline, D., Stoll, R., and Sutherland, J. C., “Scalable Tools for Generating Synthetic Isotropic Turbulence with Arbitrary Spectra,”AIAA Journal, Vol. 55, No. 1, 2017, pp. 327–331. https://doi.org/10.2514/1.J055230, URL https: //doi.org/10.2514/1.J055230

work page doi:10.2514/1.j055230 2017

[23] [23]

B.,An Introduction to Boundary Layer Meteorology, Kluwer Academic Publishers, Dordrecht, 1988

Stull, R. B.,An Introduction to Boundary Layer Meteorology, Kluwer Academic Publishers, Dordrecht, 1988

work page 1988

[24] [24]

Learning phrase representations using RNN encoder–decoder for statistical machine translation,

Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y., “Learning phrase representations using RNN encoder–decoder for statistical machine translation,”Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734

work page 2014

[25] [25]

Neural machine translation by jointly learning to align and translate,

Bahdanau, D., Cho, K., and Bengio, Y., “Neural machine translation by jointly learning to align and translate,”Proc. Int. Conf. Learning Representations (ICLR), 2015. 30

work page 2015