RL Octocopter -- Karolina Dubiel

Day 25: Being a physicist, aka swinging my drone from the kitchen table

Jun 23, 2026

I modeled and printed a GPS mount, which was the final component I needed to lock down my total system mass. Now that the drone is fully complete, I was able to collect the full system identification data on the drone. I haven't been able to dedicate as much time as I would've wanted to this drone for the past ~1.5 weeks, but I hope to lock back in now that it's a software-only task for the next little bit.

Yaw bifilar pendulum setup (I promise the strings are parallel and straight IRL) — Yaw bifilar pendulum setup
(I promise the strings are parallel IRL)

Here are the results and calculations of my system identification. They're not completely perfect or scientific, but I'm hoping domain randomization can make up for any inaccuracies. I'll be using motor/propulsion characterization information from the manufacturer and have also collected CoM info.

              I = (T² · M · g · d²) / (8π² · L) → I = 0.36974 · d² · T²

              M = 1.177 kg  ·  L = 0.39552 m  ·  d =
                half wire sep. (m)  ·  T = time for 20 osc ÷ 20 (s)
            

Measurements

Each trial timed over 20 oscillations (T = total / 20). Wire separations were measured as full width and halved for d.

7 trials each; pink = worst 2 dropped before averaging.

Axis	T1	T2	T3	T4	T5	T6	T7	Avg×20 (s)	T (s)	Wire sep. (mm)	d (m)
Roll	12.97	12.87	13.15	12.92	12.9	12.82	12.92	12.906	0.6453	332.592	0.166296
Pitch	12.77	12.64	12.82	12.63	12.75	12.84	12.82	12.76	0.638	332.592	0.166296
Yaw	13.84	13.52	13.52	13.47	13.52	13.4	13.65	13.536	0.6768	380.314	0.190157

Results

Axis	T (s)	d (m)	I (kg·m²)
Roll	0.6453	0.166296	0.004258
Pitch	0.6380	0.166296	0.004162
Yaw	0.6768	0.190157	0.006125

Day 22: Standing on business and changing of plans

Jun 20, 2026

The drone now has 8 legs to land on, which should make test flights significantly less scary. Although I never optimized this drone for weight when designing it (the body plates could have more cutouts and other optimizations could've been made), I've now decided to be more mindful about each component going forward. In total, the 8 legs weigh 7.67g.

I've also made the decision to no longer have a separate microcontroller for now. I'd originally planned to bolt a separate companion computer onto the drone to run the RL policy and feed motor commands to Betaflight over MSP -- first a Raspberry Pi 4, then a Teensy when the Pi looked like a bad fit. The problem: no matter which board I picked, my architecture needs to send 8 direct per-motor commands, and doing that over MSP fights Betaflight's safety model (motors don't reliably stop on disarm or link loss). So I'm scrapping the separate microcontroller -- my flight controller is already an STM32H743 (480MHz M7), so I'm just compiling the policy straight into the Betaflight firmware on the board I already have, which also kills a big chunk of my loop latency. Papers that inspired this: "Learning to Fly in Seconds" (Eschmann et al., RA-L 2024) and Neuroflight (Koch et al., 2019).

I'll try this out and re-evaluate if I run into significant blocks.

Day 17: The eagle has landed (taken off)!

Jun 15, 2026

Today, exactly 2.5 weeks after the kickoff of my initial idea, I officially completed the pipeline from concept -> flying octocopter with 0 prior hardware or CAD experience. When I started this project, I had never flown a drone before.

I haven't done anything except hover yet, and there's no microcontroller on board, so this is a completely regular octocopter with no RL abilities at all. I also have yet to make a GPS mount, mount my antenna, or strap anything down nicely, so there's a lot of work to do before this thing can properly fly safely.

YouTube link to flight video ->

We have liftoff!

To be clear: currently, if this drone lost a single motor in flight, it would probably stay up. Octocopters are famously tolerant to single motor failure for two reasons:

1. There's huge thrust overcapacity: 8 motors at ~125 gf load each at hover means losing one drops total capacity from ~11,000 gf to ~9,750 gf, still enough to maintain a healthy 2:1 thrust-to-weight ratio up to nearly 5 kg of drone weight (we're at 1 kg).

2. Betaflight's PID loop runs at several kHz and doesn't need to know why the drone is tilting -- it just sees the gyro reporting a roll, and commands the remaining motors on the low side to push harder. The yaw imbalance gets partially compensated by Betaflight's mixer redistributing throttle between the surviving CW and CCW motors. The drone would stay airborne with maybe a slow yaw drift and degraded responsiveness, but it usually wouldn't fall out of the sky unless other bad things happened.

The problem is that this only really works for one motor. As soon as a second motor dies (especially two same-rotation motors at 90° from each other) the static mixer breaks down. It keeps demanding thrust from two dead motors and the drone would become uncontrollable. That's the failure mode RL is supposed to fix.

P.S. -- while setting everything up in Betaflight, I set the startup chime of the drone to be Mask Off by Future :D

Also: the flat 8-arm frame has a much larger effective disc area than a typical quad, which means strong ground effect -- air pushed down by the rotors compresses against the floor and bounces thrust back up, so the drone feels weirdly buoyant near the ground (you can see it floating in the video below).

My drone sings “Mask Off” by Future upon startup 🎶🎤 @1future https://t.co/GQh8Fyy5wJ #buildinginpublic pic.twitter.com/2uouzK0spv
— Karolina Dubiel (@karolina_dubiel) June 19, 2026

Floating on its own ground-effect cushion

A common question: why not MPC?

A lot of people on X have asked this -- to be honest, the primary reason is that I specifically wanted an RL project and designed this drone around that goal. I came up with the fault-tolerant octocopter concept as a vehicle for learning RL on real hardware, not the other way around.

That said, there are real engineering arguments for RL here:

Inference cost: MPC solves an optimization problem at every timestep, which is a lot of computation for a RPi commanding 8 motors (more than RL, which is a single pass through a ~50k parameter network, which would probably be under or around 1ms)
Unknown failure state: from my understanding, MPC normally needs to know what the system is doing, and without a dedicated fault detector, this would create extra work for me. The RL policy learns to infer failure state implicitly from the gap between commanded and observed behavior.
Model mismatch tolerance: MPC is only as good as its model. My cheap motors probably aren't perfectly identical and the inertia tensor I measure will only be approximate. Heavy domain randomization during RL training explicitly teaches the policy to handle model error. An MPC controller built on the same uncertain model doesn't get that for free.

MPC is probably the more reliable choice for a project like this, but not the most fun option :D If I can't get RL working, MPC is absolutely my fallback -- and at that point I'd probably treat this whole attempt as useful data collection for a model anyway.

Day 13: What's next -- training an RL policy

Jun 11, 2026

Since posting on X, I've gotten many DMs asking exactly how I want to approach the next phase of this project: making the drone fly with RL. Here's the plan I have so far.

Most importantly, the RL policy will directly command all 8 motors at 50 Hz over a serial link to the flight controller with no traditional PID loop in the path. This is the only architecture that gives the policy full authority to reallocate thrust when motors fail.
I'm focusing on six unique failure classes (ignoring rotational equivalence): single motor, adjacent pair (45°, mixed CW/CCW), 90° same-type, 135° mixed, 180° same-type, and full ESC loss (each ESC controls its own quad). The hardest case is the 90° same-type failure, because it's the only one that hits both problems simultaneously: a yaw torque imbalance (the two dead motors were the same spin direction) and a spatial asymmetry in the remaining thrust geometry.

The circuit diagram that I drew for wiring everything up — The single- and dual-motor failures that I want to support, plus ESC loss

Losing two same-spin motors leaves 2 CW and 4 CCW running (or vice versa), yaw-torque imbalanced 2:1 at equal throttle. Balancing them forces the CW motors to run at 2× the per-motor thrust of the CCW motors. At 1393 gf max per motor, the yaw-balanced thrust ceiling works out to 5,572 gf -- enough to maintain a 2:1 thrust-to-weight ratio up to ~2.8 kg of drone weight (we're at 1 kg). The remaining 6 motors span a 270° arc, so roll and pitch authority still exists. The worst case is survivable -- the drone would be spinning, but it could still hover to a soft landing.

	Full 8-motor	90° same-type (6 motors)
Max total thrust	11,144 gf	5,572 gf (yaw balanced)
CW motor load at hover	~9%	~18%
Max drone weight at 2:1 T/W	~5.6 kg	~2.8 kg
Yaw authority	full	near zero

Simulation

I'm building the sim in MuJoCo, because it runs fast on a CPU and I have a Mac, which rules out Isaac Lab and basically everything else NVIDIA-shaped. For a single rigid body with 8 thrust points, MuJoCo is more than enough, and I can run ~128 environments in parallel on my laptop.

The model itself comes from measurements, not the CAD. I'll be gathering data on:

Total mass
Inertia tensor via the bifilar pendulum test
Motor thrust curves
Motor time constant
Hover throttle point

I'm also adding two things to my sim environment that I keep reading are what actually kill sim-to-real transfer for motor-level control:

1. Motor lag: real motors take 20–50 ms to reach a commanded speed. In sim, thrust changes instantly unless you model it. A policy that learns with instant motors learns to twitch.

2. Loop latency: on the real drone, there's ~15–30 ms between the IMU reading and thrust actually changing (serial read, inference, serial write, ESC response). If I train with zero latency, the policy will oscillate the second it touches hardware. This one scares me the most, so it's getting randomized aggressively (the policy trains against a delay that changes every episode and jitters within episodes).

puffer. just puffer. trust me. you will thank me in about a month from now.
— Chris von Csefalvay 🔜 CVPR26 (@epichrisis) June 7, 2026

Everything else physical gets randomized too: mass ±10%, per-motor thrust constants ±15% (cheap motors are not identical, I own eight data points proving this), center of mass, battery sag over a flight, sensor noise^[4].

Training

PPO^[1] via PufferLib. I looked at SAC since it's more sample-efficient, but sample efficiency solves a problem I don't have -- my sim steps are nearly free. PPO with a pile of parallel environments is what almost every sim-to-real flight paper I've read actually shipped, and it plays nicer with heavy randomization. (Also: an X reply told me "puffer. just puffer. trust me.")

Two more decisions I stole from sim-to-real literature:

1. The critic gets to cheat. During training, the value network sees ground truth the real drone will never have, like which motors are dead, the exact thrust constants, and true velocity. The actor only sees what real sensors provide. The critic gets thrown away after training, so this costs nothing at deployment. (This is called asymmetric actor-critic^[2], and I've read that it makes a huge difference when the physics are randomized this hard.)

2. No fault detector (for now). The policy sees its last 5 observation/action frames and has to figure out failures on its own, from the gap between what it commanded and what the drone did.

Under a same-type dual failure the drone physically cannot hold its heading -- the torques don't balance at any throttle combination. The right behavior is to give up on yaw, spin slowly about vertical, and stay level. If the reward punishes spinning, the policy sacrifices roll and pitch chasing a heading it can't have. Mueller & D'Andrea showed the same thing for quads losing a motor^[3] -- their recovering quad spins the whole time. Mine will too, on purpose.

Deployment

If the policy shows promising survival rates in sim, it'll get exported to ONNX and run on the RPi 4 (I think. Any opinions on this vs other microcontroller options?) the network is ~45k parameters, which is under a millisecond of inference, so the Pi is not the bottleneck. The 50 Hz loop will read attitude and gyro over serial, run the policy, and write 8 motor commands.

Then, the actual experiment: fly, kill motors from the transmitter, and find out if millions of simulated crashes taught it anything!

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal Policy Optimization Algorithms," arXiv:1707.06347, 2017.
L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel, "Asymmetric Actor Critic for Image-Based Robot Learning," RSS, 2018.
M. W. Mueller and R. D'Andrea, "Stability and control of a quadrocopter despite the complete loss of one, two, or three propellers," IEEE ICRA, 2014.
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, "Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World," IROS, 2017.

Day 11: Insider traitor-ing

Jun 9, 2026

Flashing the firmware didn't go as planned -- the USB-C input port on my H743 AeroSelfie FC is broken. This isn't the biggest deal in the world; everything on that board was pre-soldered and detaching it for return was pretty easy. The annoying part is that a broken FC is pretty critical-path, and nothing can progress until the new one comes in (thankfully soon!)

I attached the standoffs and top body plate to the drone to get an idea of what everything would look like all together and weigh the assembled drone.

The drone with the top plate and standoffs attached — The drone with the top plate
and standoffs attached

The drone weighs exactly 1kg with a mounted battery (this weight includes everything except the flight controller, which is negligible). Each motor produces approximately 950gf of thrust at 70% throttle on a fully charged 6S battery, and up to 1393gf at full throttle. Across all 8 motors, that's 7,600gf -- 7.6kg of thrust -- at 70% throttle alone, against 1kg of weight, which gives a thrust-to-weight ratio of 7.6:1. To hover, I only need 125gf per motor, which is around 15-20% throttle. That means the drone has enormous headroom above hover -- at 70% throttle it's producing nearly 8x what it needs to stay airborne. This is really, really good! An overpowered drone = way more leeway to tolerate (or ideally, fully recover from and maintain normal flight during) motor loss.

I'm kind of blocked until the new FC arrives. I'm not used to blockers like this (given my all-software background), but I'm reminding myself that it's just part of hardware to have stuff like this happen. Annoying ≠ discouraging, and I'm really excited to see this drone hover soon.

Day 9: Wired up!

Jun 7, 2026

I soldered the flight controller, two ESCs, GPS, battery wires, and receiver together! The drone is theoretically able to hover now, but I haven't tested that yet :D

I'm pretty inexperienced with soldering, so this part took me longer than any of the CAD/assembly so far. Given that I've never assembled electronics together this way, it was difficult for me to imagine how everything would fit together and to solder everything neatly. I ended up deciding to just have one battery wire, which I sandwiched between the two ESCs, so that it could serve them both.

The ESCs and flight controller sit on top of each other to simplify my center of gravity. Once I fly this drone as a regular octocopter, I'll also have to mount a Raspberry Pi or Jetson Nano (open to feedback here!) onboard to run the inference. I plan on sticking this board to the bottom of the top plate.

Before I test hovering, I'll need to:

Flash Bluejay firmware to both ESCs
Configure Betaflight: set the mixer to Octocopter Flat X, ESC protocol to DSHOT600, enable the accelerometer, and dial in conservative rates and arming parameters
Configure failsafe behavior for if RC signal drops mid-flight
Balance check: find the battery position that centers the CoM and lock it down

Day 6: Superglued, taped down, and ready to solder

Jun 4, 2026

As soon as the jig was printed, I used it to align the arms of the drone and filled any gaps in between with superglue.

The phase wires for the motors are now taped down and the frame is perfectly aligned

Filling the space between the arms with superglue while the drone is in the jig — Filling the space between
the arms with superglue while
the drone is in the jig

The arms were fully stable once the superglue set, which means I won't have any vibrational issues due to the imperfect arm alignment that I was worried about earlier. My original plan was to start soldering all the electronics today, but I'm still waiting on a soldering iron shipment. Planning to start wiring everything as soon as materials and my full-time job allow :D

Supergluing the arms in the jig required loosening the screws to get the arms to fully pop into the jig supports. One of the screws got stuck, snapped, and had to be drilled out :( Crisis averted with very minimal damage to the frame, though!

Day 4: Getting jiggy with it

Jun 2, 2026

I continued with assembly, screwing the 8 motors to the arms and the 8 propellers to the motors.

Assembled drone frame with motors and propellers — The assembled drone frame (arms, bottom
and middle plates, motors, and propellers)

A screenshot from a timelapse of me attaching the motors + propellers to the body — A screenshot from a video of me
attaching the motors + propellers
to the body

A small problem: if you tug really hard, some of the arms wiggle a bit, even when fully screwed together and tightened. I think this is due to the fact that I set the cut tolerance as 0.1mm in the CAD, not knowing how precise the CNC mill would be. For the future, a better tolerance would be 0.05mm or 0.08mm. Any wiggle room in the arms can cause vibrations when flying, which could mess up my flight dynamics and make RL-based flying impossible.

The solution for this is to 3D print a 0-tolerance assembly jig to hold the arms in perfect position while the center of the drone is superglued together. Here's the design of said jig -- it'll be printed and ready to use soon:

Day 3: Assemble!

June 1, 2026

After a weekend away at Pinnacles National Park, I screwed the body of the drone together: the 8 arms, the bottom plate, and the middle plate.

Work in progress ... screwing all of the arms together — Work in progress ... screwing all
of the arms together

Day 1: CAD, CNC milling, and humble beginnings

May 30, 2026

While on a recent vacation in Guatemala, I came up with the idea for this project from a hammock on the shores of Lake Atitlán. Immediately, I ordered (most of) the necessary parts on Amazon and started ideating on exactly how to go about building a fully RL-powered, intelligently fault-tolerant octocopter.

I have never done a substantial hardware project before. I have never CADed, I've soldered once, and I have no experience with drone flight controllers, speed controllers, or anything in that domain. I have never flown a drone before. I have never trained an RL policy as complex as the one required for this project.
I got started thanks to hours spent on Google, Reddit, Claude, and talking to Tomas, an AE major who helped with every CAD and machine shop question I had.

The first two steps of this project were both started and completed today:
1. CAD of the drone's body and arms in Fusion360
2. CNC milling forms out of G-10 fiberglass (arms) and 5mm carbon fiber (body)

Fusion 360 CAD render with all eight motors placed on the frame — Fusion360 view of the finished CAD -- full octocopter with third-party motor/propeller .step files imported into my design

Top-down layout of the arm geometry in CAD — Intertwined arm geometry layout

The arms intertwine in the center of the drone for stability. They're sandwiched between a flower-shaped bottom plate and a larger body plate on top. I decided on flower cutouts for the carbon fiber body.

Arms drawing exported for CNC milling — Arms -- prepared for the CNC mill

Body plate drawing exported for CNC milling — Body plates -- prepared for the CNC mill

After that, it was time to CNC cut. This was my first time in a machine shop :D
I ended up having to re-cut the arms because the drill was going too fast and pushed the G-10 plate as it was cutting. I learned that cutting at ~20% speed when using thick materials is a much better idea than having to re-cut due to going too fast.

CNC toolpath simulation for the arm cuts — The CNC mill cutting out the arms

CNC mill cutting the carbon fiber body plate — Me preparing the CNC mill

Freshly CNC-cut parts laid out — Freshly cut G-10 and carbon fiber parts

Fault-Tolerant RL Octocopter