44 lines
2.2 KiB
Markdown
44 lines
2.2 KiB
Markdown
# ECP-0122: Publisher Source Locks And Cgroup Cleanup
|
|
|
|
Status: Draft
|
|
|
|
## Problem statement
|
|
|
|
LA channels disappeared when stale proof/archive publisher helpers kept HDHomeRun tuner HTTP streams
|
|
open after the managed publishers restarted. The restarted publishers saw `503 Service Unavailable`
|
|
from the tuners, stopped refreshing the public stream directory, and the guide expired to empty.
|
|
|
|
## Constraints
|
|
|
|
- A publisher restart must not leave child media processes holding tuners.
|
|
- A duplicate publisher on the same node must not open the same physical source URL.
|
|
- Keep rollback simple and deployment-owned; no source-device firmware or manual tuner reset should be
|
|
required for normal recovery.
|
|
|
|
## Decision
|
|
|
|
The NixOS publisher wrapper now takes a non-blocking per-source lock under
|
|
`/run/every-channel/source-locks` before launching `ec-node`. If another managed publisher on the
|
|
same node is already reading that input URL, the duplicate launch logs and skips instead of opening a
|
|
second tuner stream.
|
|
|
|
Publisher and archive worker services also set explicit `KillMode=control-group`,
|
|
`TimeoutStopSec=10s`, and `SendSIGKILL=true`, and archive auto-workers terminate tracked children on
|
|
shutdown before systemd's cgroup cleanup runs. The async `wt-publish` and `nbc-wt-publish` ffmpeg
|
|
children are marked kill-on-drop so cancelled Rust futures do not strand encoder children.
|
|
|
|
## Alternatives considered
|
|
|
|
- Rely on operator cleanup only. Rejected because the failure silently empties the public guide after
|
|
TTL expiry.
|
|
- Run duplicate publishers for redundancy. Rejected because OTA tuner capacity is the scarce resource;
|
|
redundancy should happen after one source read, via publisher fanout and relay mirroring.
|
|
- Add only systemd cgroup cleanup. Rejected because it does not prevent two managed units from
|
|
intentionally opening the same source at the same time.
|
|
|
|
## Rollout / teardown plan
|
|
|
|
Deploy the NixOS module update to every publisher node. Confirm no stale proof/archive helpers remain,
|
|
all managed publisher units are active, and `/api/public-streams` lists the expected channels.
|
|
Rollback is reverting this module change and redeploying; source locks are runtime files under `/run`
|
|
and disappear on reboot.
|