every-channel/every.channel

Fork 0

every.channel 91dad67fc2

deploy-cloudflare / checks (push) Failing after 3s

Details

ci-gates / checks (push) Failing after 5s

Details

deploy-cloudflare / deploy (push) Has been skipped

Details

Add duplicate publisher determinism proof

2026-06-10 03:33:46 -07:00

25 KiB

Raw Blame History

ECP-0156: Duplicate Publisher Deterministic Data Layer

Status: Draft

Context

Two publisher nodes may broadcast the same logical channel at the same time. The archive and relay layers need this for resilience, but duplicate publishers currently risk looking like conflicting streams instead of convergent copies of the same media.

Decision

Duplicate publishers are valid for a published channel. The data layer dedupes and verifies media by content identity, not by publisher envelope identity:

CMAF init and media segment bytes for the same input, ladder profile, and chunk cadence must be byte-for-byte identical.
BLAKE3 media hashes and per-rung Merkle roots are the shared data identity.
Publisher manifests may carry different stream_id, epoch_id, created_unix_ms, signatures, locators, and manifest ids.
The archive must treat matching media hashes from different publishers as corroborating sources.
Archive records must carry source identity. Two copied buffers with the same source_node are not duplicate-publisher proof, even when their BLAKE3 hashes match.
Divergent hashes for the same logical channel, rendition, and media time are misses that must be measured before the data is promoted as redundant.

Verification

The proof path has two stages:

Single-node duplicate-publisher tests produce the same ladder twice with different publisher identities and assert byte-for-byte BLAKE3 equality for every generated init and media segment. The duplicate_publishers_same_input_produce_identical_cmaf_ladder_bytes test is part of the default Rust test path when ffmpeg is present; it is not an ignored E2E.
Production verification runs the same channel on two real publishers long enough to measure duplicate media convergence, hash divergence, missing objects, and backfill behavior in Grafana.

The goal is not just "two publishers are online." Success requires elapsed production time behind the numbers and dashboards that show duplicate hits, misses, and archive repair.

Consequences

Manifest ids cannot be used as the archive dedupe key for duplicate publishers. Operators get a clear signal when two publishers produce identical bytes versus merely announcing the same channel. If encoder determinism changes, the single-node test fails before production redundancy silently degrades.

Alternatives considered

Dedupe by manifest id. This preserves envelope identity but misses the resilience property because duplicate publishers necessarily produce different envelopes.
Dedupe by logical channel and time only. This can hide encoder divergence and promote bad redundancy before byte-level media equality is proven.
Disable duplicate publishers until the scheduler is perfect. This avoids conflict handling but weakens live resilience and leaves the archive data layer untested.

Rollout/teardown

Roll forward by landing the local deterministic test, adding miss/duplicate metrics to the archive scrape surface, then running two publishers for one logical channel in production. Roll back by disabling duplicate scheduling for that channel; existing content-addressed archive objects remain valid.

Implementation notes

The node-agent archive scrape now exposes duplicate-source and miss gauges without placing hashes in labels. Per node, role, broadcast, rendition, and track it reports duplicate matching hash sources, duplicate hash sequences, divergent hash sequences, and missing hash records. Grafana shows those next to archive ladder coverage so the production duplicate-publisher run has an operator-visible convergence and miss signal.

ec-node archive-convergence is the primary proof surface for duplicate media identity. It compares named archive manifest roots directly inside the Rust node binary, groups records by logical stream, rendition, track, and sequence, and only returns ok when every expected sequence has matching duplicate source hashes with no missing or divergent sequence. It also requires archive records to carry at least two distinct source_node values, so mirrored global-origin manifests cannot pass as independent publishers. This keeps the media-data invariant in the already-shipped Rust artifact instead of extending the Python node-agent. Rollout gates should use ec-node archive-convergence --require-ok; the command emits the JSON report either way, but --require-ok exits non-zero unless duplicate convergence is actually proven. ec-node archive-convergence --prometheus renders the same Rust convergence report as scrapeable every_channel_archive_* gauges for duplicate source records, duplicate sequences, divergent sequences, source-local divergence, missing hashes, missing source identity, media timing conflicts, record source count, and pass/fail state. This gives Grafana a Rust-owned proof metric path while the older node-agent ladder metrics remain available during migration. ec-node archive-convergence-serve keeps that proof path live for Prometheus: it serves /health and /metrics, recomputes convergence on each scrape, and emits scrape_ok=0 metrics instead of disappearing when manifests are missing or not ready. Production Grafana can therefore distinguish a healthy metrics target from an unproven duplicate-publisher run. The Nix services.every-channel.ec-node.archive.convergence.proofs option turns those Rust proof servers into named systemd units. Each proof must name at least two NAME=PATH sources and gets a dedicated listen address, so operators can add one Prometheus scrape target per duplicate channel without resurrecting the Python node-agent as the proof oracle. Forge enables an initial la-kcop-publisher-origin proof target on 127.0.0.1:7812 and Prometheus scrapes it alongside the other local every.channel targets. Until two real publisher manifest roots are mounted or fetched into Forge, the target intentionally uses the Forge manifest root as a placeholder peer and must report unproven convergence rather than green duplicate-publisher proof. Forge also exposes a static two-NUC la-kcet-remote-publisher-origin proof target once that channel is the live converged duplicate sample. Dynamic Headscale file-SD remains useful for discovery, but it can include relays and stale nodes; duplicate-publisher proof should use an explicit publisher pair or future scheduler group labels so unrelated agents do not turn a passing channel red. This static proof exports its own Rust convergence gauges rather than gating on broad legacy Prometheus aggregates, because older node-agent archive metrics do not yet carry enough proof-role labels to avoid summing stale divergence from unrelated scrape targets.

ec-node archive-convergence-measure is the primary production proof harness. It fetches named node-agent /v1/archive-manifest samples or direct manifest JSONL URLs, writes bounded temporary manifest roots, reuses the Rust archive-convergence report, and optionally queries Prometheus for the Grafana-facing duplicate/miss series. A production run only counts as complete when the report has elapsed samples, matching duplicate media hashes, zero divergent hash sequences, and live Prometheus series for the duplicate/miss gauges. The measurement groups records by archive record source identity, not by the URL used to fetch a manifest, and reports source identity failures when the sample is too weak to prove independent publisher data. The older scripts/measure-duplicate-publishers.py stays compatibility-only until live operators and Forge jobs are switched to the Rust command. The convergence report carries bounded divergent-sequence samples with per-source hash, byte size, receive time, source node/session, CAS path, and media timing when present, so a red proof is immediately actionable without fetching full manifests by hand. It also reports a non-blocking media-timing-missing count and Prometheus gauge; hash equality can still prove duplicate bytes, but missing timing means a divergent proof cannot yet classify whether the mismatch is a phase/windowing problem or an encoder byte problem. Publisher service builders must pass proof cadence explicitly. Both the node-agent publisher supervisor and Nix systemd publisher module set --publisher-archive-segment-duration-ms and --publisher-start-boundary-ms by default, so netbooted NUCs do not depend on stale hotpatch CLI defaults when aligning duplicate publisher proof windows.

ec-node archive-convergence-measure-serve turns that production proof harness into a live Prometheus target. Each /metrics scrape fetches one fresh sample from node-agent or direct JSONL manifest URLs, keeps a bounded in-memory sample window, and only reports measurement ok after the configured elapsed window has passed. This avoids blocking Prometheus scrapes for the measurement duration while still preventing two immediate samples from looking like a real production run. The service emits measurement-level gauges for fetch success, source record counts, invalid records, elapsed seconds, Prometheus series presence, reasons, and then appends the same every_channel_archive_* convergence gauges from the latest sample. The service can also read Prometheus file-SD JSON from Forge's Headscale node-agent discovery and turn each discovered target into a sampled node-agent manifest source. The Nix services.every-channel.ec-node.archive.convergence.remoteProofs option creates these remote proof services as systemd units from either static NAME=URL endpoints or dynamic file-SD inputs. Forge now exposes la-kcop-remote-publisher-origin on 127.0.0.1:7813 using the live /var/lib/prometheus/every-channel-node-agents.json inventory. It must stay red until that inventory contains at least two independent publisher node-agents whose publisher.m4s records converge.

When archive-serve ports are not reachable from the proof runner, the node-agent exposes a bounded, tailnet-authenticated /v1/archive-manifest sample endpoint. The harness can use that endpoint for each named publisher, compare local manifest records directly, and still require at least two elapsed samples before declaring success.

Production duplicate proof also requires archive-buffer freshness on each participating publisher. During mixed-generation rollouts, the current node-agent may supervise an older installed archive-hot-sync helper. The agent must probe helper flag support and omit optional arguments such as --link-mode when an older helper lacks them, because a silently failing archive-buffer sync can leave one publisher with healthy live streams but stale manifests.

The publisher buffer refresh is freshness-first: the node-managed sync must mirror full manifests without origin object fetch before running the slower cache fill/prune pass. This lets convergence checks, Grafana scrape surfaces, and demand fetch see current BLAKE3 indexes even when proactive CAS object backfill is still catching up.

wt-archive stamps each archive index record with source_node and source_session. The Nix archive launcher passes the runtime hostname as --source-node; explicit CLI users can override it. Older records without this identity continue to parse, but proof commands and production measurement mark them incomplete instead of accepting them as independent publisher evidence.

Publisher-origin proof must be captured before relay/archive mirroring can collapse source identity. When node-agent archive buffering is enabled, supervised wt-publish processes pass --publisher-archive-output-dir, --publisher-archive-manifest-dir, and --publisher-archive-source-node. wt-publish now supervises the Rust publisher-proof-archive-source worker for that archive track. The worker splits the MPEG-TS source by source-clock windows, fresh-encodes each bounded window with the deterministic proof profile, stores the resulting media fragments under publisher.m4s in the same CAS/index format, and stamps them with node-agent source identity. The relay playback encoder remains continuous for watchability, but it is no longer the BLAKE3 data identity for duplicate-publisher proof. The source identity is explicit override first, then hostname plus a short hash of machine-id, with boot-id only as a fallback; hostname alone is not enough because publisher images can share names like ec-node. Production duplicate verification can therefore compare publisher.m4s from two publisher buffers without treating copied relay-origin manifests as independent sources.

Proof tooling defaults to publisher.m4s. The relay video track 0.m4s is useful playback data, but it is not duplicate-publisher proof: a publisher buffer may hold relay/cache records on 0.m4s that have no publisher source identity. Production convergence checks that sample 0.m4s should be treated as playback/archive-cache diagnostics, not byte-for-byte duplicate publisher evidence.

The first live publisher-origin measurements on 2026-06-08 showed correct distinct source labels but zero matching duplicate sequences for la-nbc4, la-pbs-socal, and la-kcet. The failure is useful: independent wt-publish processes currently start their fragment sequence and encoder chunk phase at local process start, so sequence 0 from two publishers is not necessarily the same broadcast moment. Duplicate-publisher proof therefore requires a shared chunk clock or scheduler-controlled aligned encoder phase before byte-for-byte archive convergence can pass in production.

Publisher-origin publisher.m4s records now require timed fMP4 fragments for global proof and map those fragments onto observed wall-clock epoch buckets instead of local process counters. The Rust writer learns track timescales from the init moov box, reads fragment moof/traf/tfhd+tfdt decode timestamps to reject untimed proof when possible, then assigns group_sequence = observed_epoch_bucket * bucket_stride + fragment_slot. Fragments that lack usable timing still fall back to the previous local counter so publishing does not fail hard on malformed metadata, but duplicate-publisher proof should use timed fragments. The wt-publish ffmpeg path also preserves source timestamps and uses closed-GOP, single-threaded x264 settings with forced keyframe cadence so independent publishers have a real chance of producing identical bytes for the same media time window.

A later live run on 2026-06-08 found a stricter local invariant before cross-publisher byte equality: each publisher must produce at most one hash for a given source_node and group_sequence. Production publisher.m4s samples for la-kcop and la-ktla showed multiple hashes from the same source in the same sequence bucket because real fMP4 fragments can arrive faster than the configured proof segment duration, and the writer rounded decode time into repeated buckets. The writer now uses a fixed per-epoch bucket stride and increments an in-bucket fragment slot when multiple timed fragments arrive inside the same proof duration. This keeps source-local manifests unique while allowing independently restarted publishers to align on the same observed wall-clock bucket. ec-node archive-convergence reports this separately as source_local_divergent_sequences so operator tooling can distinguish a self-contradicting publisher from two publishers that simply disagree about the same sequence. Because bucket-strided proof sequences intentionally leave numeric gaps, archive convergence uses the observed sparse sequence union for publisher-origin manifests. Dense contiguous sequence ranges remain available in the simulation layer when a model explicitly expects every integer sequence.

The 2026-06-08 live la-kcet/publisher.m4s sample from Forge confirmed that both publishers now emit distinct source identities (ec-node-c3546fa5abc3 and ec-node-72cf1c3aa196) with no missing source identity records on the sampled publisher-origin manifests. It also confirmed the remaining bug: 156 shared publisher-origin sequences had zero byte-for-byte BLAKE3 matches and 156 divergent hashes. The next production fix must align the publisher chunk clock and encoded fMP4 byte stream, not merely improve scrape or Grafana plumbing.

After the wall-clock bucket hotpatch, the same live proof no longer has fake sparse-range missing IDs: la-kcet/publisher.m4s reported 376 observed proof sequences, zero missing source identities, zero source-local divergent sequences, and 234 divergent shared sequences. A byte-level sample for sequence 7287381184512 had different sizes, different BLAKE3 hashes, different tfdt base-media-decode-times (210210 versus 0), and different mdat payload prefixes. Across that sampled window there were zero common fragment hashes even when sequence IDs were ignored, proving that the remaining failure was independent-encoder media phase and fMP4 payload determinism, not an archive manifest identity bug.

A later la-kcop/publisher.m4s sample exposed a stricter live-source bug: source-window proof records were using unsynced MPEG-TS PCR chunk indexes as group_sequence when the OTA UTC clock was unavailable, causing restart-dependent jumps such as 93M, 135M, 341M, and 390M. The source-proof writer now uses the chunk UTC start only when the chopper reports synced timing, otherwise it falls back to the local wall-clock window start, and rewrites fMP4 tfdt onto that shared window before hashing. The live HTTP proof worker also retries transient source opens/reader failures in unbounded live mode, so a tuner 503 or malformed TS burst is skipped/retried instead of killing the publisher proof process.

The synced source-window clock must use the chopper's exact global chunk index, not integer UTC seconds. A 1001 ms proof cadence makes whole-second UTC start metadata lossy: adjacent source windows can share the same utc_start_unix, which caused one publisher to write several different hashes under the same source-local group_sequence. Synced chunks therefore use ChunkTiming.chunk_index directly; only unsynced chunks fall back to local wall-clock receipt. The live source-window proof writer also keeps subfragment slot allocation as stream state instead of per-chunk state. Real source windows can be emitted in more than one proof chunk for the same media timing sequence; resetting the slot counter for every chunk reused the same group_sequence and made one healthy publisher look self-divergent. The counter is bounded so the long-running live worker does not grow state unbounded.

wt-publish now has an explicit Unix-epoch start boundary, defaulting to the publisher-origin proof cadence. After relay setup and immediately before spawning ffmpeg it waits until the next boundary, so a newly restarted duplicate publisher starts its forced-keyframe clock on the same global cadence as already-running publishers. This does not by itself prove byte equality; it removes the local-process-start phase error from the live publisher path and gives rollout measurement a deterministic knob (--publisher-start-boundary-ms 0 disables it). The live ffmpeg argument plan is factored into a Rust unit-testable helper so future timestamp/keyframe changes are pinned in ec-node instead of being inferred from node-agent process strings or production samples.

The first post-start-clock live sample still failed duplicate byte identity: both publishers landed in the same wall-clock proof bucket, but one fragment carried tfdt=390390 while the other carried tfdt=30030, matching the staggered restart gap. Their mdat prefixes differed too, which means a continuous x264 encoder keeps enough local history that a later restart cannot prove byte equality merely by joining the same wall-clock cadence. The live profile therefore enables x264 stitchable=1 alongside closed GOP, no scenecut, no B-frames, no lookahead, and one thread. If that still does not converge in production, the next fix is a deliberately stateless per-fragment encode or a Rust-owned media clock/segmenter that resets encoder history at each proof boundary.

The follow-up production hotpatch moved the start-boundary wait to immediately before ffmpeg spawn, enabled stitchable=1, and restarted both publisher nodes in the same batch. The latest la-kcet sample still reported zero matching duplicate hashes with no missing source identity and no source-local divergence. A final sampled shared sequence differed by hundreds of milliseconds of receive time and by media size (439737 versus 270283 bytes for the video fragment), so the remaining mismatch is not just MP4 timestamp metadata. Production duplicate proof now needs a stateless fragment boundary: either encode each proof segment from the same bounded source window with fresh encoder state, or make the Rust media pipeline own exact frame-window capture before calling ffmpeg/x264.

Archive manifests now carry optional fMP4 media timing for publisher-origin fragments. The archive-convergence gate treats equal archive group sequence IDs with different media sequence or decode-time metadata as media_sequence_conflict, even if the byte hash happens to match. This keeps production proof aligned with the Rust simulation model: a duplicate publisher only proves the same broadcast moment when the archive sequence and media window agree.

The first stateless proof primitives are now in ec-node. publisher-proof-segment takes one bounded MPEG-TS source-clock window, runs a fresh deterministic x264/AAC fMP4 encode, splits the result into init bytes and media fragments, and emits BLAKE3 hashes for each. publisher-proof-windows uses the Rust MPEG-TS source-clock splitter first, then fresh-encodes each bounded window and reports per-window source TS, init, and media hashes. Proof windows carry explicit MPEG-TS decoder context with --preroll-packets, defaulting to the repo-owned WT_PUBLISH_PROOF_PREROLL_PACKETS budget, so mid-GOP windows do not silently depend on best-effort decoder recovery. Focused Rust tests fresh-encode the same bounded input and the same finite source-window campaign twice and assert byte-for-byte identical proof hashes.

publisher-proof-duplicates is the single-node duplicate-publisher gate for the stateless path. It runs publisher-proof-windows independently under at least two publisher identity labels, defaults to publisher-a and publisher-b, and compares source TS, init, and media fragment BLAKE3 hashes for every source-clock window. --require-ok exits non-zero unless every compared window matches, and duplicate publisher labels are rejected so the proof cannot accidentally collapse to one source identity. publisher-proof-compare is the cross-machine stateless proof gate: each publisher can run publisher-proof-windows against the same bounded source TS file locally, copy the JSON report back to the operator host, and compare the reports by named publisher. It rejects mismatched chunk cadence, missing windows, source TS hash mismatches, init hash mismatches, media fragment hash mismatches, and empty media windows.

publisher-proof-remote-compare is the production operator harness for that cross-machine gate. It copies one bounded .ts proof input to each named SSH target, runs ec-node publisher-proof-windows on the target, stores each returned JSON report under the local output directory, writes a compare.json, and returns the existing compare report with upload/proof timing. Remote labels use the same single-component validation as publisher identities, remote proof roots are constrained to /tmp/every-channel-*, and cleanup is opt-in so the generated proof files remain inspectable unless the operator explicitly requests removal. This keeps the live proof path in Rust without making the Python node-agent a new oracle. It proves the machine/runtime/compiler boundary without requiring the two NUCs to share a live tuner at the exact same instant.

publisher-proof-archive-source is the live archive implementation of the same proof model. It can read local source files directly, read plain HTTP MPEG-TS bodies directly for HDHomeRun-style sources, or fall back to an ffmpeg MPEG-TS copy reader for other inputs. Each emitted source-clock window is encoded with fresh proof state, archived as CAS-backed publisher.m4s records, and mapped to source-clock group sequences with explicit media timing metadata. A focused Rust regression now archives the same bounded TS input as two source nodes, then runs archive-convergence against the two manifest roots and requires full duplicate convergence with zero divergent or source-local divergent sequences.

Forge ci-gates now runs the publisher_proof and archive_convergence Rust filters before the distributed simulator campaign, so single-node byte-for-byte determinism, source-window archive proof semantics, and duplicate archive convergence are checked before production rollout evidence is considered. The next production step is to deploy the updated node binary and let fresh publisher.m4s source-window records age into the Grafana scrape window so live duplicate metrics can replace the older continuous-encoder divergence.

25 KiB Raw Blame History

ECP-0156: Duplicate Publisher Deterministic Data Layer

Context

Decision

Verification

Consequences

Alternatives considered

Rollout/teardown

Implementation notes

25 KiB

Raw Blame History