Harden LA publishers and add multi-relay guide
This commit is contained in:
parent
5d6f77f868
commit
cfc4902016
13 changed files with 1430 additions and 402 deletions
|
|
@ -0,0 +1,44 @@
|
|||
# ECP-0122: Publisher Source Locks And Cgroup Cleanup
|
||||
|
||||
Status: Draft
|
||||
|
||||
## Problem statement
|
||||
|
||||
LA channels disappeared when stale proof/archive publisher helpers kept HDHomeRun tuner HTTP streams
|
||||
open after the managed publishers restarted. The restarted publishers saw `503 Service Unavailable`
|
||||
from the tuners, stopped refreshing the public stream directory, and the guide expired to empty.
|
||||
|
||||
## Constraints
|
||||
|
||||
- A publisher restart must not leave child media processes holding tuners.
|
||||
- A duplicate publisher on the same node must not open the same physical source URL.
|
||||
- Keep rollback simple and deployment-owned; no source-device firmware or manual tuner reset should be
|
||||
required for normal recovery.
|
||||
|
||||
## Decision
|
||||
|
||||
The NixOS publisher wrapper now takes a non-blocking per-source lock under
|
||||
`/run/every-channel/source-locks` before launching `ec-node`. If another managed publisher on the
|
||||
same node is already reading that input URL, the duplicate launch logs and skips instead of opening a
|
||||
second tuner stream.
|
||||
|
||||
Publisher and archive worker services also set explicit `KillMode=control-group`,
|
||||
`TimeoutStopSec=10s`, and `SendSIGKILL=true`, and archive auto-workers terminate tracked children on
|
||||
shutdown before systemd's cgroup cleanup runs. The async `wt-publish` and `nbc-wt-publish` ffmpeg
|
||||
children are marked kill-on-drop so cancelled Rust futures do not strand encoder children.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- Rely on operator cleanup only. Rejected because the failure silently empties the public guide after
|
||||
TTL expiry.
|
||||
- Run duplicate publishers for redundancy. Rejected because OTA tuner capacity is the scarce resource;
|
||||
redundancy should happen after one source read, via publisher fanout and relay mirroring.
|
||||
- Add only systemd cgroup cleanup. Rejected because it does not prevent two managed units from
|
||||
intentionally opening the same source at the same time.
|
||||
|
||||
## Rollout / teardown plan
|
||||
|
||||
Deploy the NixOS module update to every publisher node. Confirm no stale proof/archive helpers remain,
|
||||
all managed publisher units are active, and `/api/public-streams` lists the expected channels.
|
||||
Rollback is reverting this module change and redeploying; source locks are runtime files under `/run`
|
||||
and disappear on reboot.
|
||||
Loading…
Add table
Add a link
Reference in a new issue