every.channel/evolution/proposals/ECP-0065-nixos-runner-images.md

51 lines
2.5 KiB
Markdown

# ECP-0065: NixOS Runner Images + Netboot Artifacts
Status: Implemented
## Decision
Publish a first-party, reproducible NixOS "runner" system definition from this repo, and expose build outputs suitable for:
- local-disk installs (pave/reinstall),
- netboot (iPXE/PXE) bootstrap, and
- byte-identical runner OS images produced in CI.
The runner system:
- is defined in-repo as a `nixosConfiguration` in `flake.nix`,
- exports the repo source tree inside the OS at a stable path (read-only) so the node can self-build and verify from the same flake,
- uses `ec-node` as the primary long-running publisher binary, with orchestration via NixOS + systemd.
- defaults to a read-only root filesystem with a tmpfs-backed overlayfs upperdir (appliance semantics), while image/bootstrap variants (netboot/ISO/sdimage) may disable this where it conflicts with their initrd/root setup.
Initial implementation targets `aarch64-linux` builds first (local builds via OrbStack). `x86_64-linux` is defined in the flake but may not be built until an x86 builder is available.
## Motivation
- "Bootstrap path == update path": the same flake definition and CI-built artifacts should be usable to (re)install and to update.
- Fleet operability: remove per-node hand configuration; treat nodes as cattle.
- Verifiability: runners can rebuild and compare their OS closure against the CI artifacts using the embedded flake source.
## Scope
In scope:
- `nixosConfigurations.ec-runner-{aarch64,x86_64}` in `flake.nix`.
- `nixosConfigurations.ec-runner-*-netboot` and `nixosConfigurations.ec-runner-*-iso` for image artifacts.
- Minimal runner NixOS module for baseline host settings and stable in-OS flake source path.
- Docs/scripts for building netboot outputs locally in OrbStack.
Out of scope (defer):
- CI publishing pipeline (binary cache, attestation, release upload).
- Remote runtime provisioning (fetching per-node channel lists).
- Hardware-accelerated transcode changes (keep current CPU x264 baseline).
## Alternatives considered
- Keep runner images out-of-repo and publish ad hoc artifacts. Rejected because it weakens reproducibility and provenance.
- Restrict to one install path only (disk install only). Rejected because netboot/bootstrap is required for fleet recovery.
## Rollout / Reversibility
- Rollout begins with local builds and a single test machine.
- Reversible by removing the `nixosConfigurations` and runner module; existing nodes can continue to run via manual `tmux` or ad-hoc installs.