every.channel/evolution/proposals/ECP-0082-unifi-pxe-runner-rollout.md

2.2 KiB

ECP-0082: Unifi PXE Rollout Path for Runner Images

Status: Implemented

Note: Persistent declarative host operation is specified in ECP-0083.

Context

Runner netboot artifacts now publish from CI, but there is no repository-native operating path for fleet provisioning on common prosumer networks (for example Unifi VLANs).

Unifi DHCP can expose next-server/bootfile settings, but iPXE chainloading often requires conditional bootfile behavior to avoid loops (ipxe.efi first stage, script second stage). Not all controller setups expose that cleanly.

Decision

  1. Add first-party scripts for local netboot staging and serving:
    • stage x86_64 netboot artifacts from Forgejo Releases (or local tarball),
    • stage iPXE UEFI binary for TFTP,
    • run HTTP + TFTP + ProxyDHCP via dnsmasq for deterministic chainloading.
  2. Keep Unifi DHCP as the IP authority; use ProxyDHCP only to supply bootfile logic.
  3. Document a concrete NUC rollout sequence for same-VLAN provisioning.
  4. Keep dependencies minimal (curl, tar, python3, dnsmasq) and avoid requiring image flashing workflows.
  5. Support an optional UniFi-only mode by providing an embedded-script iPXE build path (ec-ipxe.efi) so clients can chainload without DHCP conditional logic.
  6. Verify release artifact integrity during staging when SHA256SUMS.txt is published.
  7. Harden serving/staging defaults:
    • default to local iPXE artifacts (remote iPXE download requires explicit opt-in),
    • support optional chain token protection for netboot.ipxe,
    • support HTTP CIDR allowlists for artifact serving.

Alternatives considered

  • Require Unifi DHCP conditional iPXE rules. Rejected because controller capabilities vary and misconfiguration risks boot loops.
  • Keep manual USB-only provisioning. Rejected because it increases labor for multi-node rollout.
  • Add a heavy provisioning stack (MAAS/Foreman/Kickstart integration). Rejected as too much operational overhead for current scale.

Rollout / teardown plan

  • Rollout:
    • merge scripts/docs,
    • run netboot-stage on the boot server,
    • run netboot-serve on the NUC VLAN and boot hosts via PXE.
  • Teardown:
    • stop netboot-serve,
    • remove staged artifacts under tmp/netboot,
    • continue with ISO+USB fallback path.