Wire HDHomeRun observations and recover Forge OP Stack
This commit is contained in:
parent
8065860449
commit
0d86104762
18 changed files with 1613 additions and 58 deletions
|
|
@ -0,0 +1,50 @@
|
|||
# ECP-0110: `ecp-forge` Hetzner Robot recovery wrapper
|
||||
|
||||
Status: Draft
|
||||
|
||||
## Problem / context
|
||||
|
||||
`git.every.channel` is a single dedicated Hetzner host. When SSH and HTTPS are both unreachable,
|
||||
the blockchain and Forgejo validation path stalls before repo-owned deployment tools can connect.
|
||||
Robot can recover the host, but browser-only recovery is hard to repeat and easy to lose across
|
||||
agent handoffs.
|
||||
|
||||
## Decision
|
||||
|
||||
Add a repo-local Robot wrapper for `ecp-forge` recovery:
|
||||
|
||||
- default to server `2800441` / `95.216.114.54`,
|
||||
- read Robot Webservice credentials from environment variables or the existing 1Password item at
|
||||
runtime,
|
||||
- avoid storing Robot passwords in git or shell profiles,
|
||||
- expose explicit status, rescue, reset, recover, and reachability-probe commands, and
|
||||
- mask Robot-generated rescue passwords unless the operator explicitly opts into printing them.
|
||||
|
||||
The wrapper treats rescue activation and reset as operational recovery steps, not deployment. Once
|
||||
the host is reachable again, `scripts/deploy-ecp-forge.sh` remains the source of truth for the
|
||||
NixOS system state.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Future agents can recover the Forge after a local 1Password CLI sign-in without asking for pasted
|
||||
Robot secrets.
|
||||
- The host identity and Robot server number are documented in the repo instead of being rediscovered
|
||||
from the browser UI.
|
||||
- Recovery actions remain explicit commands; ordinary probes never mutate Robot state.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- Continue browser-only Robot recovery. Rejected because it is too stateful for repeated agent
|
||||
handoffs and does not leave a repo-owned runbook.
|
||||
- Store Robot credentials in a repo-local file. Rejected because Robot credentials are operational
|
||||
secrets and should stay in 1Password or the caller's environment.
|
||||
- Move recovery into the deploy script. Rejected because Robot rescue/reset is a host-recovery action,
|
||||
while `deploy-ecp-forge.sh` should remain the NixOS deployment entrypoint.
|
||||
|
||||
## Rollout / teardown
|
||||
|
||||
1. Add `scripts/hetzner-robot-forge.sh`.
|
||||
2. Document the emergency path in `docs/DEPLOY_ECP_FORGE.md`.
|
||||
3. Use `probe` first, then `status`, then `recover` only when the Forge is unreachable.
|
||||
|
||||
Teardown is removing the wrapper and returning to browser-only Robot operations.
|
||||
Loading…
Add table
Add a link
Reference in a new issue