ops: add CI boot-image releases and Unifi PXE rollout
Some checks failed
ci-gates / checks (push) Has been cancelled
deploy-cloudflare / checks (push) Has been cancelled
deploy-cloudflare / deploy (push) Has been cancelled

This commit is contained in:
every.channel 2026-02-28 22:53:59 -08:00
parent 043b1730dc
commit be26313225
No known key found for this signature in database
9 changed files with 720 additions and 0 deletions

View file

@ -0,0 +1,35 @@
# ECP-0081: CI Boot Image Deployment to Forgejo Releases
Status: Implemented
## Context
Runner boot artifacts (netboot/ISO/SD) are currently built by hand from local Linux/Nix environments. That slows repeatable infra rollout and weakens traceability from commit to runnable images.
The constitution requires infrastructure definitions to live in-repo and stay independently operable. Boot image publication should follow the same Forgejo-primary CI model used for web deploys, while mirror hosts remain distribution-only.
## Decision
1. Add a dedicated Forgejo Actions workflow to build runner boot images from flake outputs in CI.
2. Trigger this workflow on:
- manual dispatch (`workflow_dispatch`), and
- release-style tags (`boot-v*`).
3. Publish built artifacts to Forgejo Releases on the primary host using repository-scoped API calls and the workflow token.
4. Exclude Codeberg mirror runs using the existing server guard (`github.server_url != 'https://codeberg.org'`).
5. Start with x86_64 image targets in CI (`netboot`, `iso`) to keep runtime/runner requirements explicit and reversible.
## Alternatives considered
- Keep boot image builds fully manual. Rejected because operator discipline alone does not provide consistent provenance.
- Push images to external object storage first. Rejected for now because Forgejo Releases are already part of the controlled primary platform.
- Build all architectures in one CI pass immediately. Rejected because runner architecture availability is not guaranteed and would make initial rollout brittle.
## Rollout / teardown plan
- Rollout:
- merge workflow + docs,
- run a manual dispatch to verify artifact publication,
- optionally create `boot-v*` tags for versioned image drops.
- Teardown:
- disable or delete the boot-image workflow,
- continue using local/manual `nix build` paths from `docs/RUNNER_IMAGES.md`.

View file

@ -0,0 +1,36 @@
# ECP-0082: Unifi PXE Rollout Path for Runner Images
Status: Implemented
## Context
Runner netboot artifacts now publish from CI, but there is no repository-native operating path for fleet provisioning on common prosumer networks (for example Unifi VLANs).
Unifi DHCP can expose next-server/bootfile settings, but iPXE chainloading often requires conditional bootfile behavior to avoid loops (`ipxe.efi` first stage, script second stage). Not all controller setups expose that cleanly.
## Decision
1. Add first-party scripts for local netboot staging and serving:
- stage x86_64 netboot artifacts from Forgejo Releases (or local tarball),
- stage iPXE UEFI binary for TFTP,
- run HTTP + TFTP + ProxyDHCP via `dnsmasq` for deterministic chainloading.
2. Keep Unifi DHCP as the IP authority; use ProxyDHCP only to supply bootfile logic.
3. Document a concrete NUC rollout sequence for same-VLAN provisioning.
4. Keep dependencies minimal (`curl`, `tar`, `python3`, `dnsmasq`) and avoid requiring image flashing workflows.
## Alternatives considered
- Require Unifi DHCP conditional iPXE rules. Rejected because controller capabilities vary and misconfiguration risks boot loops.
- Keep manual USB-only provisioning. Rejected because it increases labor for multi-node rollout.
- Add a heavy provisioning stack (MAAS/Foreman/Kickstart integration). Rejected as too much operational overhead for current scale.
## Rollout / teardown plan
- Rollout:
- merge scripts/docs,
- run `netboot-stage` on the boot server,
- run `netboot-serve` on the NUC VLAN and boot hosts via PXE.
- Teardown:
- stop `netboot-serve`,
- remove staged artifacts under `tmp/netboot`,
- continue with ISO+USB fallback path.