every.channel/docs/DEPLOY_ECP_FORGE.md

2.9 KiB

Sovereign Deploy: ecp-forge

This repository owns deployment of git.every.channel (Hetzner 300TB host).

Requirements

  • SSH access to root@git.every.channel.
  • Local key that matches host authorized_keys (default: ~/.ssh/id_ed25519).
  • nix with flakes enabled.
  • For emergency Hetzner recovery, Robot Webservice credentials in 1Password item Hetzner Robot or EVERY_CHANNEL_ROBOT_USER / EVERY_CHANNEL_ROBOT_PASSWORD.

Deploy

./scripts/deploy-ecp-forge.sh

For the OP Stack operator path and observation-rail validation, see:

cat docs/OP_STACK_ECP_FORGE.md

Equivalent:

NIX_SSHOPTS="-o BatchMode=yes -o IdentityAgent=none -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519" \
nix run nixpkgs#nixos-rebuild -- \
  --flake .#ecp-forge \
  --target-host root@git.every.channel \
  --build-host root@git.every.channel \
  --use-remote-sudo \
  switch

Overrides

  • EVERY_CHANNEL_FORGE_TARGET_HOST (default root@git.every.channel)
  • EVERY_CHANNEL_FORGE_BUILD_HOST (default same as target)
  • EVERY_CHANNEL_FORGE_SSH_IDENTITY (default ~/.ssh/id_ed25519)

Emergency Robot recovery

Use this only when both Forge HTTPS and SSH are unreachable. The dedicated host is server 2800441 at 95.216.114.54.

./scripts/hetzner-robot-forge.sh probe

If the probe confirms outage, sign in to 1Password CLI so the wrapper can read the existing Robot Webservice item at runtime:

op signin
./scripts/hetzner-robot-forge.sh status

To boot the host into Hetzner Rescue and issue a hardware reset:

./scripts/hetzner-robot-forge.sh recover
./scripts/hetzner-robot-forge.sh wait-ssh

The wrapper masks Robot-generated rescue passwords by default and tries to attach the local SSH key fingerprint when activating rescue. Set EVERY_CHANNEL_ROBOT_AUTHORIZED_KEY_FINGERPRINT if Robot uses a different uploaded key fingerprint. Set EVERY_CHANNEL_ROBOT_PRINT_SENSITIVE=1 only when password-based rescue login is required.

If production boots but public SSH and HTTPS still time out, inspect the previous boot from Rescue. The known recovery check is host-wide VPN state: mullvad-daemon.service must not be active on ecp-forge, because its firewall policy can block public Forge ingress even when Robot and the NixOS firewall allow the ports. If a not-yet-redeployed generation still starts Mullvad and the mutable cached target state is rewritten to secured, back up /boot/grub/grub.cfg, append systemd.mask=mullvad-daemon.service systemd.mask=mullvad-early-boot-blocking.service to the default Linux line, and reboot production. After public SSH returns, deploy this repo's NixOS config so the bootloader is regenerated without the emergency mask.

Verify

ssh -o BatchMode=yes -o IdentityAgent=none -i ~/.ssh/id_ed25519 root@git.every.channel \
  'hostnamectl --static; systemctl is-active forgejo caddy every-channel-netboot-stage every-channel-netboot'