Wire HDHomeRun observations and recover Forge OP Stack

This commit is contained in:
every.channel 2026-05-03 20:24:04 -07:00
parent 8065860449
commit 0d86104762
No known key found for this signature in database
18 changed files with 1613 additions and 58 deletions

View file

@ -7,6 +7,8 @@ This repository owns deployment of `git.every.channel` (Hetzner 300TB host).
- SSH access to `root@git.every.channel`.
- Local key that matches host `authorized_keys` (default: `~/.ssh/id_ed25519`).
- `nix` with flakes enabled.
- For emergency Hetzner recovery, Robot Webservice credentials in 1Password item `Hetzner Robot`
or `EVERY_CHANNEL_ROBOT_USER` / `EVERY_CHANNEL_ROBOT_PASSWORD`.
## Deploy
@ -38,6 +40,44 @@ nix run nixpkgs#nixos-rebuild -- \
- `EVERY_CHANNEL_FORGE_BUILD_HOST` (default same as target)
- `EVERY_CHANNEL_FORGE_SSH_IDENTITY` (default `~/.ssh/id_ed25519`)
## Emergency Robot recovery
Use this only when both Forge HTTPS and SSH are unreachable. The dedicated host is server
`2800441` at `95.216.114.54`.
```sh
./scripts/hetzner-robot-forge.sh probe
```
If the probe confirms outage, sign in to 1Password CLI so the wrapper can read the existing Robot
Webservice item at runtime:
```sh
op signin
./scripts/hetzner-robot-forge.sh status
```
To boot the host into Hetzner Rescue and issue a hardware reset:
```sh
./scripts/hetzner-robot-forge.sh recover
./scripts/hetzner-robot-forge.sh wait-ssh
```
The wrapper masks Robot-generated rescue passwords by default and tries to attach the local SSH key
fingerprint when activating rescue. Set `EVERY_CHANNEL_ROBOT_AUTHORIZED_KEY_FINGERPRINT` if Robot
uses a different uploaded key fingerprint. Set `EVERY_CHANNEL_ROBOT_PRINT_SENSITIVE=1` only when
password-based rescue login is required.
If production boots but public SSH and HTTPS still time out, inspect the previous boot from Rescue.
The known recovery check is host-wide VPN state: `mullvad-daemon.service` must not be active on
`ecp-forge`, because its firewall policy can block public Forge ingress even when Robot and the
NixOS firewall allow the ports. If a not-yet-redeployed generation still starts Mullvad and the
mutable cached target state is rewritten to `secured`, back up `/boot/grub/grub.cfg`, append
`systemd.mask=mullvad-daemon.service systemd.mask=mullvad-early-boot-blocking.service` to the
default Linux line, and reboot production. After public SSH returns, deploy this repo's NixOS config
so the bootloader is regenerated without the emergency mask.
## Verify
```sh