every.channel/evolution/proposals/ECP-0111-disable-host-mullvad-for-forge-public-recovery.md

2.2 KiB

ECP-0111: Disable Host Mullvad for Forge Public Recovery

Status: Draft

Problem / context

git.every.channel must stay reachable on public SSH and HTTPS so blockchain validation, deploys, and Forgejo review can proceed. The current ecp-forge boot reaches Forgejo, Caddy, and SSH socket activation, but the host becomes unreachable once the host-wide Mullvad daemon connects and applies its firewall policy.

Decision

Disable host-wide Mullvad on ecp-forge and stop making forge NBC workers wait for host Mullvad. The public Forge host stays on the Hetzner interface. NBC egress that needs Mullvad should return through a process-scoped or namespace-scoped design that does not install a host-wide kill switch.

Consequences

  • git.every.channel can serve SSH, HTTPS, and ACME challenges on the public Hetzner address.
  • Forge recovery no longer depends on manual Mullvad split-tunnel state.
  • Forge NBC Philadelphia publishing loses the host-wide Mullvad egress assumption until a narrower worker-only egress path lands.

Alternatives considered

  • Keep host-wide Mullvad and rely on split-tunnel exceptions. Rejected because production logs show public SSH and HTTPS time out while Mullvad's firewall policy is active.
  • Keep Mullvad enabled but mask only Caddy or SSH from the tunnel. Rejected because the daemon's firewall policy still governs inbound public reachability at the host level.
  • Disable the whole ec-node service. Rejected because archive and blockchain workers should remain independent of the NBC egress incident.

Rollout / teardown

  1. From Rescue, inspect the previous boot and confirm Forgejo/Caddy start before Mullvad applies its firewall policy.
  2. If Mullvad rewrites its cached target state back to secured, temporarily append systemd.mask=mullvad-daemon.service systemd.mask=mullvad-early-boot-blocking.service to the default GRUB entry and reboot production.
  3. Deploy the NixOS config that keeps host-wide Mullvad disabled, which regenerates the bootloader without the emergency mask.
  4. Verify ssh, https://git.every.channel/, Forgejo, and Caddy.

Teardown is re-enabling host Mullvad only after a tested design preserves public inbound Forge traffic.