Harden LA publishers and add multi-relay guide
Some checks are pending
ci-gates / checks (push) Waiting to run
deploy-cloudflare / checks (push) Waiting to run
deploy-cloudflare / deploy (push) Blocked by required conditions

This commit is contained in:
Conrad Kramer 2026-06-10 01:28:15 -07:00
parent 5d6f77f868
commit cfc4902016
No known key found for this signature in database
13 changed files with 1430 additions and 402 deletions

View file

@ -0,0 +1,44 @@
# ECP-0120: Material Watch Surface
Status: Draft
## Problem statement
The web watch page should feel like a real television/player surface, not a protocol console,
marketing page, or dark decorative shell. It also needs room for modern watch behaviors: channel
switching, multiview, scrubbing, clipping, direct tuning, and DVR mode.
## Constraints
- Preserve existing live WebTransport playback, DVR replay, public station scanning, share links,
and manual signal tuning.
- Keep the first screen content/player-first and operable with obvious controls.
- Use YouTube as interaction grammar, not as visual branding.
- Keep the palette bright enough for daytime/living-room use.
- Do not rely on browser-specific copy.
## Alternatives considered
- Continue the dark broadcast-console skin from ECP-0118. Rejected because it reads too much like
a control-room hero image and too little like a daily-use player.
- Copy YouTube visually. Rejected because every.channel should inherit watch-page mechanics without
borrowing YouTube brand language.
- Use pure flat CSS. Rejected because the desired direction is a tactile television object with
material depth.
## Decision
Rebuild the static web watch surface around a large video player, right-side channel rail, lower
scrubber, clip controls, and multiview tray. Use generated bitmap material assets for live-preview
tiles and subtle hardware texture, then layer a lighter skeuomorphic system in CSS: warm wood,
brushed metal, smoked acrylic, cream buttons, and broadcast-monitor geometry.
The design lineage is intentionally loose: Braun/Ulm clarity for simple control layout, Sony-style
broadcast monitor seriousness for the player frame, and Bang & Olufsen-style furniture warmth for
the room/object feel. These references are constraints, not objects to copy.
## Rollout / teardown plan
Ship as a static web UI change and validate with desktop/mobile screenshots plus the web build.
Teardown is reverting the HTML/CSS shell to the previous watch page while leaving playback,
directory, and share-link code paths intact.

View file

@ -0,0 +1,50 @@
# ECP-0121: Greedy Multi-Relay Streams
Status: Draft
## Problem statement
Live streams should be publishable to more than one relay at once so viewers can pick the fastest
path and regional relays can mirror each other. The current public directory shape only exposes one
`relay_url` per stream, which encourages duplicate publishers when we want redundancy. For OTA
sources, duplicate publishers are dangerous because each one opens another tuner read.
## Constraints
- Preserve the existing `relay_url`, `broadcast_name`, and `track_name` fields for deployed web,
archive, and manual watch links.
- Keep those primary fields as the compatibility contract; `relays[]` is additive and optional for
consumers.
- Do not duplicate HDHomeRun source reads to get multi-region relay presence.
- Let the public directory advertise multiple relays before every consumer implements racing.
- Keep rollback simple: clients can ignore `relays[]` and keep using the primary legacy fields.
## Decision
Add an ordered `relays[]` candidate list to public stream entries and stream upserts. Stream upserts
continue to require the primary legacy fields; the first relay is mirrored into those fields and
remains the primary/default path. `control-bridge-web` now forwards all relay transports already
present in a control announcement instead of flattening to the first relay only. Current consumers
can keep reading the legacy fields until they explicitly add relay racing.
The intended next step is a single ingest/fanout publisher: read the source once, encode/fragment
once, publish the same stream objects to LAX and NYC relay sessions, and optionally let relays mirror
to each other. Consumers can then race candidates greedily by availability/latency without causing
extra source reads.
## Alternatives considered
- Start one publisher per relay. Rejected because it duplicates source reads and can exhaust physical
tuners, which was the LA outage failure mode.
- Replace the legacy fields with `relays[]`. Rejected because deployed clients and archive workers
already depend on the single-relay shape.
- Accept `relays[]` without primary legacy fields. Rejected because that would make rollback depend
on every publisher being downgraded at the same time as the directory.
- Wait for full relay racing before changing the directory. Rejected because exposing the ordered
candidate set is a small compatible step that unblocks incremental consumers.
## Rollout / teardown plan
Deploy the compatible schema first. Then add publisher fanout and consumer relay racing behind
separate flags. Teardown is removing `relays[]` from upserts and consumers; legacy primary-field
behavior remains intact throughout.

View file

@ -0,0 +1,44 @@
# ECP-0122: Publisher Source Locks And Cgroup Cleanup
Status: Draft
## Problem statement
LA channels disappeared when stale proof/archive publisher helpers kept HDHomeRun tuner HTTP streams
open after the managed publishers restarted. The restarted publishers saw `503 Service Unavailable`
from the tuners, stopped refreshing the public stream directory, and the guide expired to empty.
## Constraints
- A publisher restart must not leave child media processes holding tuners.
- A duplicate publisher on the same node must not open the same physical source URL.
- Keep rollback simple and deployment-owned; no source-device firmware or manual tuner reset should be
required for normal recovery.
## Decision
The NixOS publisher wrapper now takes a non-blocking per-source lock under
`/run/every-channel/source-locks` before launching `ec-node`. If another managed publisher on the
same node is already reading that input URL, the duplicate launch logs and skips instead of opening a
second tuner stream.
Publisher and archive worker services also set explicit `KillMode=control-group`,
`TimeoutStopSec=10s`, and `SendSIGKILL=true`, and archive auto-workers terminate tracked children on
shutdown before systemd's cgroup cleanup runs. The async `wt-publish` and `nbc-wt-publish` ffmpeg
children are marked kill-on-drop so cancelled Rust futures do not strand encoder children.
## Alternatives considered
- Rely on operator cleanup only. Rejected because the failure silently empties the public guide after
TTL expiry.
- Run duplicate publishers for redundancy. Rejected because OTA tuner capacity is the scarce resource;
redundancy should happen after one source read, via publisher fanout and relay mirroring.
- Add only systemd cgroup cleanup. Rejected because it does not prevent two managed units from
intentionally opening the same source at the same time.
## Rollout / teardown plan
Deploy the NixOS module update to every publisher node. Confirm no stale proof/archive helpers remain,
all managed publisher units are active, and `/api/public-streams` lists the expected channels.
Rollback is reverting this module change and redeploying; source locks are runtime files under `/run`
and disappear on reboot.