every.channel/evolution/proposals/ECP-0065-nixos-runner-images.md

2.5 KiB

ECP-0065: NixOS Runner Images + Netboot Artifacts

Status: Implemented

Decision

Publish a first-party, reproducible NixOS "runner" system definition from this repo, and expose build outputs suitable for:

  • local-disk installs (pave/reinstall),
  • netboot (iPXE/PXE) bootstrap, and
  • byte-identical runner OS images produced in CI.

The runner system:

  • is defined in-repo as a nixosConfiguration in flake.nix,
  • exports the repo source tree inside the OS at a stable path (read-only) so the node can self-build and verify from the same flake,
  • uses ec-node as the primary long-running publisher binary, with orchestration via NixOS + systemd.
  • defaults to a read-only root filesystem with a tmpfs-backed overlayfs upperdir (appliance semantics), while image/bootstrap variants (netboot/ISO/sdimage) may disable this where it conflicts with their initrd/root setup.

Initial implementation targets aarch64-linux builds first (local builds via OrbStack). x86_64-linux is defined in the flake but may not be built until an x86 builder is available.

Motivation

  • "Bootstrap path == update path": the same flake definition and CI-built artifacts should be usable to (re)install and to update.
  • Fleet operability: remove per-node hand configuration; treat nodes as cattle.
  • Verifiability: runners can rebuild and compare their OS closure against the CI artifacts using the embedded flake source.

Scope

In scope:

  • nixosConfigurations.ec-runner-{aarch64,x86_64} in flake.nix.
  • nixosConfigurations.ec-runner-*-netboot and nixosConfigurations.ec-runner-*-iso for image artifacts.
  • Minimal runner NixOS module for baseline host settings and stable in-OS flake source path.
  • Docs/scripts for building netboot outputs locally in OrbStack.

Out of scope (defer):

  • CI publishing pipeline (binary cache, attestation, release upload).
  • Remote runtime provisioning (fetching per-node channel lists).
  • Hardware-accelerated transcode changes (keep current CPU x264 baseline).

Alternatives considered

  • Keep runner images out-of-repo and publish ad hoc artifacts. Rejected because it weakens reproducibility and provenance.
  • Restrict to one install path only (disk install only). Rejected because netboot/bootstrap is required for fleet recovery.

Rollout / Reversibility

  • Rollout begins with local builds and a single test machine.
  • Reversible by removing the nixosConfigurations and runner module; existing nodes can continue to run via manual tmux or ad-hoc installs.