2.9 KiB
Sovereign Deploy: ecp-forge
This repository owns deployment of git.every.channel (Hetzner 300TB host).
Requirements
- SSH access to
root@git.every.channel. - Local key that matches host
authorized_keys(default:~/.ssh/id_ed25519). nixwith flakes enabled.- For emergency Hetzner recovery, Robot Webservice credentials in 1Password item
Hetzner RobotorEVERY_CHANNEL_ROBOT_USER/EVERY_CHANNEL_ROBOT_PASSWORD.
Deploy
./scripts/deploy-ecp-forge.sh
For the OP Stack operator path and observation-rail validation, see:
cat docs/OP_STACK_ECP_FORGE.md
Equivalent:
NIX_SSHOPTS="-o BatchMode=yes -o IdentityAgent=none -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519" \
nix run nixpkgs#nixos-rebuild -- \
--flake .#ecp-forge \
--target-host root@git.every.channel \
--build-host root@git.every.channel \
--use-remote-sudo \
switch
Overrides
EVERY_CHANNEL_FORGE_TARGET_HOST(defaultroot@git.every.channel)EVERY_CHANNEL_FORGE_BUILD_HOST(default same as target)EVERY_CHANNEL_FORGE_SSH_IDENTITY(default~/.ssh/id_ed25519)
Emergency Robot recovery
Use this only when both Forge HTTPS and SSH are unreachable. The dedicated host is server
2800441 at 95.216.114.54.
./scripts/hetzner-robot-forge.sh probe
If the probe confirms outage, sign in to 1Password CLI so the wrapper can read the existing Robot Webservice item at runtime:
op signin
./scripts/hetzner-robot-forge.sh status
To boot the host into Hetzner Rescue and issue a hardware reset:
./scripts/hetzner-robot-forge.sh recover
./scripts/hetzner-robot-forge.sh wait-ssh
The wrapper masks Robot-generated rescue passwords by default and tries to attach the local SSH key
fingerprint when activating rescue. Set EVERY_CHANNEL_ROBOT_AUTHORIZED_KEY_FINGERPRINT if Robot
uses a different uploaded key fingerprint. Set EVERY_CHANNEL_ROBOT_PRINT_SENSITIVE=1 only when
password-based rescue login is required.
If production boots but public SSH and HTTPS still time out, inspect the previous boot from Rescue.
The known recovery check is host-wide VPN state: mullvad-daemon.service must not be active on
ecp-forge, because its firewall policy can block public Forge ingress even when Robot and the
NixOS firewall allow the ports. If a not-yet-redeployed generation still starts Mullvad and the
mutable cached target state is rewritten to secured, back up /boot/grub/grub.cfg, append
systemd.mask=mullvad-daemon.service systemd.mask=mullvad-early-boot-blocking.service to the
default Linux line, and reboot production. After public SSH returns, deploy this repo's NixOS config
so the bootloader is regenerated without the emergency mask.
Verify
ssh -o BatchMode=yes -o IdentityAgent=none -i ~/.ssh/id_ed25519 root@git.every.channel \
'hostnamectl --static; systemctl is-active forgejo caddy every-channel-netboot-stage every-channel-netboot'