Over-the-air (OTA) firmware updates enable remote modification of embedded device software without physical access, and they have become essential for maintaining security, fixing bugs, and adding features to deployed IoT devices throughout their operational lifetime. An OTA system consists of several components: a firmware build and signing pipeline, a distribution server or CDN, a device-side update agent, a bootloader capable of managing multiple firmware slots, and a reporting mechanism for update status. The most common architecture uses A/B (dual-bank) partitioning, where the new firmware is downloaded to an inactive slot while the current firmware continues running, then the bootloader swaps the active slot upon reboot after verifying the new image's cryptographic signature. This approach ensures the device always has a known-good fallback if the update fails. Delta (differential) updates reduce download size by transmitting only the binary difference between firmware versions, critical for bandwidth-constrained devices using LoRaWAN or NB-IoT where full image transfers would take hours.
How Does A/B Partition OTA Work?
In an A/B partition scheme, the device's flash memory is divided into a bootloader region, Slot 0 (primary/active), and Slot 1 (secondary/staging). When an update is available, the update agent downloads the new signed firmware image into Slot 1 while the application in Slot 0 continues operating normally. Once the download is complete and the image header checksum passes, the device flags Slot 1 as pending and reboots. The bootloader (typically MCUboot) reads the pending flag, verifies the cryptographic signature of the image in Slot 1, and either swaps the two slots or directly boots from the new slot. After the first boot, the application must confirm the update by setting a "confirmed" flag; otherwise, the bootloader will revert to the previous image on the next reboot, implementing automatic rollback protection.
/* MCUboot swap states and OTA flow */
// Device flash layout
// ┌──────────────────┐ 0x00000
// │ MCUboot │ (32 KB)
// ├──────────────────┤ 0x08000
// │ Slot 0 (Active) │ (384 KB)
// ├──────────────────┤ 0x68000
// │ Slot 1 (Staging)│ (384 KB)
// ├──────────────────┤ 0xC8000
// │ Scratch Area │ (16 KB)
// └──────────────────┘
// OTA update agent pseudocode
void ota_update_task(void) {
fw_image_t *image = ota_check_for_update();
if (image == NULL) return;
// Download to Slot 1
ota_download_to_slot1(image->url, image->size);
// Verify checksum before requesting swap
if (verify_sha256(SLOT1_ADDR, image->size, image->hash)) {
boot_request_upgrade();
NVIC_SystemReset();
}
}
// After boot, confirm update succeeded
void app_init(void) {
if (boot_is_img_confirmed() == false) {
if (self_test_passed()) {
boot_write_img_confirmed();
}
// If not confirmed, MCUboot will revert on next reboot
}
}What Are Delta Updates and When Should You Use Them?
Delta (differential) updates transmit only the binary differences between the current and target firmware versions, dramatically reducing download size. Tools like jdiff, bsdiff, or Mender's delta update module can compress a firmware difference to 10-30% of the full image size. This is critical for LPWAN-connected devices where a 256 KB firmware image at LoRaWAN data rates could take 30+ minutes to download as a full image but under 5 minutes as a delta. The trade-off is that delta updates require the device to have enough RAM or scratch flash to reconstruct the full image, and they must be generated for each specific source version to target version transition.
What Cloud Platforms Support IoT OTA Updates?
Major OTA update platforms and their strengths:
- AWS IoT Jobs + S3: Scalable update delivery with fleet management, MQTT-based job notifications, and integration with FreeRTOS OTA library.
- Azure Device Update for IoT Hub: Supports A/B updates and delta updates with device groups and phased rollouts.
- Mender.io: Open-source OTA platform with MCU support, delta updates, and a hosted SaaS option. Popular in the embedded community.
- Memfault: Combines OTA updates with device monitoring, crash reporting, and fleet metrics. Supports MCUboot and custom bootloaders.
- Golioth: Cloud platform specifically designed for MCU-based IoT devices with built-in OTA, logging, and settings management.
How Do You Prevent OTA Update Failures?
OTA reliability requires careful engineering. Implement checksums at multiple levels: transport-layer integrity (TLS), image-level SHA-256 hash, and signature verification. Use watchdog timers to detect boot loops and trigger automatic rollback. Test updates against all hardware revisions in your fleet before wide deployment. Implement staged rollouts (canary deployments) where updates reach 1%, then 10%, then 100% of devices, monitoring error rates at each stage. Maintain a persistent update status log so the cloud can track which devices successfully updated, which failed, and why. For battery-powered devices, check battery level before starting an update—an update that fails mid-write due to power loss can brick the device.
Key takeaway: OTA firmware updates require A/B flash partitioning for safe rollback, cryptographic signature verification (ECDSA P-256 or Ed25519) to prevent unauthorized code execution, monotonic version counters for anti-rollback, and staged fleet deployment. Delta updates reduce download size by 70-90% for bandwidth-constrained LPWAN devices.
How Did We Handle OTA Updates for a Fleet of 10,000 Devices?
At EmbedCrest, we designed and deployed the OTA infrastructure for a smart metering company with 10,000 LoRaWAN-connected water meters across a metropolitan area. Each meter used an STM32L4 MCU with 1 MB flash partitioned into 32 KB MCUboot, 448 KB Slot 0 (active), 448 KB Slot 1 (staging), and 16 KB scratch area. Due to LoRaWAN bandwidth constraints (approximately 50 bytes per uplink at SF10), a full 200 KB firmware image would take over 4,000 packets and 40+ hours to transfer. We implemented bsdiff-based delta updates that compressed typical firmware changes to 15-30 KB, reducing transfer time to 3-6 hours per device. The update process used multicast Class C mode: meters temporarily switched from Class A (lowest power) to Class C (continuous listening) during a scheduled maintenance window, received delta fragments via multicast downlinks, and switched back to Class A after confirmation. We deployed in 5 stages: 1% canary group, 10%, 25%, 50%, and 100%, monitoring success rates, boot loop detection, and power consumption anomalies at each stage.
What Are the Most Dangerous OTA Failure Modes?
The most critical OTA failure mode is a bricked device that cannot recover without physical access. This occurs when the bootloader itself is corrupted or when both firmware slots contain invalid images. Mitigate this by making the bootloader immutable (placed in write-protected flash region), implementing a golden image recovery partition (a minimal firmware in a separate protected flash area that the ROM bootloader falls back to), and enforcing battery level checks before starting the update process. On STM32 devices, configure the Option Bytes to write-protect the bootloader flash region using WRP (Write Protection) bits. Another dangerous failure is a successful update that introduces a subtle regression detected only weeks later, after the entire fleet has been updated. Counter this by implementing automatic health checks: the updated firmware must confirm itself within a configurable timeout (30-300 seconds) by calling boot_write_img_confirmed(), or the bootloader automatically reverts. Include functional self-tests (sensor reading validation, communication loopback, watchdog verification) in the confirmation logic, not just a simple timer.
How Do You Choose Between Full Image and Delta OTA Updates?
Full image updates transmit the complete firmware binary, verified by SHA-256 hash and ECDSA signature. They are simpler to implement, require no client-side reconstruction logic, and can update to any version from any version. However, they consume maximum bandwidth and transfer time. Delta updates transmit only the binary difference between the current and target versions, generated using algorithms like bsdiff, xdelta3, or Mender's mender-binary-delta. Delta patches are typically 10-30% of the full image size for minor updates and 40-60% for major refactors. The trade-offs are significant: delta updates require the device to have sufficient RAM or scratch flash to apply the patch (bsdiff requires approximately 2x the patch size in working memory), they must be generated for each source-to-target version combination (N versions create N-1 patches for sequential updates or N*(N-1)/2 for arbitrary version jumps), and a corrupted current image prevents patch application. For LPWAN-connected devices with severe bandwidth constraints, delta updates are essential. For Wi-Fi-connected devices with ample bandwidth, full image updates are simpler and more robust.


