every.channel/evolution/proposals/ECP-0082-unifi-pxe-runner-rollout.md

44 lines
2.2 KiB
Markdown

# ECP-0082: Unifi PXE Rollout Path for Runner Images
Status: Implemented
Note: Persistent declarative host operation is specified in ECP-0083.
## Context
Runner netboot artifacts now publish from CI, but there is no repository-native operating path for fleet provisioning on common prosumer networks (for example Unifi VLANs).
Unifi DHCP can expose next-server/bootfile settings, but iPXE chainloading often requires conditional bootfile behavior to avoid loops (`ipxe.efi` first stage, script second stage). Not all controller setups expose that cleanly.
## Decision
1. Add first-party scripts for local netboot staging and serving:
- stage x86_64 netboot artifacts from Forgejo Releases (or local tarball),
- stage iPXE UEFI binary for TFTP,
- run HTTP + TFTP + ProxyDHCP via `dnsmasq` for deterministic chainloading.
2. Keep Unifi DHCP as the IP authority; use ProxyDHCP only to supply bootfile logic.
3. Document a concrete NUC rollout sequence for same-VLAN provisioning.
4. Keep dependencies minimal (`curl`, `tar`, `python3`, `dnsmasq`) and avoid requiring image flashing workflows.
5. Support an optional UniFi-only mode by providing an embedded-script iPXE build path (`ec-ipxe.efi`) so clients can chainload without DHCP conditional logic.
6. Verify release artifact integrity during staging when `SHA256SUMS.txt` is published.
7. Harden serving/staging defaults:
- default to local iPXE artifacts (remote iPXE download requires explicit opt-in),
- support optional chain token protection for `netboot.ipxe`,
- support HTTP CIDR allowlists for artifact serving.
## Alternatives considered
- Require Unifi DHCP conditional iPXE rules. Rejected because controller capabilities vary and misconfiguration risks boot loops.
- Keep manual USB-only provisioning. Rejected because it increases labor for multi-node rollout.
- Add a heavy provisioning stack (MAAS/Foreman/Kickstart integration). Rejected as too much operational overhead for current scale.
## Rollout / teardown plan
- Rollout:
- merge scripts/docs,
- run `netboot-stage` on the boot server,
- run `netboot-serve` on the NUC VLAN and boot hosts via PXE.
- Teardown:
- stop `netboot-serve`,
- remove staged artifacts under `tmp/netboot`,
- continue with ISO+USB fallback path.