ops: add CI boot-image releases and Unifi PXE rollout
Some checks failed
ci-gates / checks (push) Has been cancelled
deploy-cloudflare / checks (push) Has been cancelled
deploy-cloudflare / deploy (push) Has been cancelled

This commit is contained in:
every.channel 2026-02-28 22:53:59 -08:00
parent 043b1730dc
commit be26313225
No known key found for this signature in database
9 changed files with 720 additions and 0 deletions

102
docs/NUC_UNIFI_NETBOOT.md Normal file
View file

@ -0,0 +1,102 @@
# NUC Fleet Netboot (Unifi + ProxyDHCP)
This runbook provisions x86_64 NUCs from runner netboot artifacts without USB image flashing.
It uses:
- Unifi DHCP for IP leases.
- Local `dnsmasq` ProxyDHCP for PXE/iPXE bootfile logic.
- Local HTTP + TFTP service for boot artifacts.
## Why ProxyDHCP
iPXE commonly needs two boot stages:
1. firmware PXE -> `ipxe.efi`
2. iPXE -> `netboot.ipxe`
If DHCP always returns `ipxe.efi`, clients can loop forever. ProxyDHCP handles stage-specific boot responses cleanly while leaving Unifi as the DHCP lease server.
## Prerequisites
- A Linux boot server on the same VLAN/L2 domain as the NUCs.
- Unifi network with normal DHCP enabled.
- Local DNS record on that VLAN: `boot.every.channel -> <boot-server-ip>`.
- `curl`, `tar`, `python3`, `dnsmasq` installed on the boot server.
- Runner netboot artifact already published to Forgejo Releases (or available as a local tarball).
## 1) Stage artifacts
From repository root on the boot server:
```sh
./scripts/netboot-stage.sh
```
Optional inputs:
- `EVERY_CHANNEL_NETBOOT_RELEASE_TAG=boot-v2026.02.28`
- `EVERY_CHANNEL_NETBOOT_TARBALL=/path/to/ec-runner-x86_64-netboot-....tar.gz`
- `EVERY_CHANNEL_FORGE_TOKEN=<token>` for private releases
- `EVERY_CHANNEL_NETBOOT_HOSTNAME=boot.every.channel`
This stages:
- `tmp/netboot/http/{kernel,initrd,netboot.ipxe}`
- `tmp/netboot/tftp/ipxe.efi`
## 2) Serve HTTP + TFTP + ProxyDHCP
Example (replace values for your VLAN):
```sh
sudo \
EVERY_CHANNEL_NETBOOT_LISTEN_IP=10.20.30.2 \
EVERY_CHANNEL_NETBOOT_INTERFACE=eth0 \
EVERY_CHANNEL_NETBOOT_PROXY_SUBNET=10.20.30.0/24 \
EVERY_CHANNEL_NETBOOT_HOSTNAME=boot.every.channel \
./scripts/netboot-serve.sh
```
Notes:
- Keep this process running during provisioning.
- Do not set Unifi DHCP bootfile options while this proxy mode is active.
- Ensure `boot.every.channel` resolves to the boot server IP from NUC clients.
## 3) Unifi / NUC settings
Unifi:
- Keep DHCP enabled for the provisioning VLAN.
- Leave DHCP boot/TFTP overrides unset when using `netboot-serve.sh`.
- Create/verify local DNS host override: `boot.every.channel -> <boot-server-ip>`.
NUC BIOS:
- Enable UEFI network boot (IPv4 PXE).
- Disable Legacy/CSM if possible.
- Put network boot before disk for first install cycle.
## 4) Provision the fleet
1. Boot each NUC on the provisioning VLAN.
2. PXE will chainload into iPXE and then runner `netboot.ipxe`.
3. Complete install/bootstrap flow on each node.
4. After successful install, switch boot order back to local disk.
## Troubleshooting
- Symptom: iPXE loop (`ipxe.efi` repeatedly)
- Cause: static DHCP bootfile without iPXE-aware logic.
- Fix: use ProxyDHCP flow (`netboot-serve.sh`) or set conditional DHCP rules.
- Symptom: NUC gets IP but never downloads boot artifacts
- Verify firewall allows UDP 67/68, UDP 69, and TCP 8080 between NUCs and boot server.
- Symptom: no `dnsmasq` offers seen
- Verify `EVERY_CHANNEL_NETBOOT_INTERFACE` and `EVERY_CHANNEL_NETBOOT_PROXY_SUBNET`.
## Security / networking
- Tailscale is not required for provisioning.
- Keep the provisioning VLAN isolated from regular clients.
- Stop `netboot-serve.sh` when rollout is complete.