Add duplicate publisher determinism proof

2026-06-10 03:28:55 -07:00 · 2026-06-10 03:28:55 -07:00 · 91dad67fc2
commit 91dad67fc2
parent 5d0f3077d3
18 changed files with 21569 additions and 595 deletions
--- a/evolution/proposals/ECP-0156-duplicate-publisher-deterministic-data-layer.md
+++ b/evolution/proposals/ECP-0156-duplicate-publisher-deterministic-data-layer.md
@ -0,0 +1,334 @@
+# ECP-0156: Duplicate Publisher Deterministic Data Layer
+
+Status: Draft
+
+## Context
+
+Two publisher nodes may broadcast the same logical channel at the same time. The archive and relay
+layers need this for resilience, but duplicate publishers currently risk looking like conflicting
+streams instead of convergent copies of the same media.
+
+## Decision
+
+Duplicate publishers are valid for a published channel. The data layer dedupes and verifies media by
+content identity, not by publisher envelope identity:
+
+- CMAF init and media segment bytes for the same input, ladder profile, and chunk cadence must be
+  byte-for-byte identical.
+- BLAKE3 media hashes and per-rung Merkle roots are the shared data identity.
+- Publisher manifests may carry different `stream_id`, `epoch_id`, `created_unix_ms`, signatures,
+  locators, and manifest ids.
+- The archive must treat matching media hashes from different publishers as corroborating sources.
+- Archive records must carry source identity. Two copied buffers with the same `source_node` are not
+  duplicate-publisher proof, even when their BLAKE3 hashes match.
+- Divergent hashes for the same logical channel, rendition, and media time are misses that must be
+  measured before the data is promoted as redundant.
+
+## Verification
+
+The proof path has two stages:
+
+1. Single-node duplicate-publisher tests produce the same ladder twice with different publisher
+   identities and assert byte-for-byte BLAKE3 equality for every generated init and media segment.
+   The `duplicate_publishers_same_input_produce_identical_cmaf_ladder_bytes` test is part of the
+   default Rust test path when ffmpeg is present; it is not an ignored E2E.
+2. Production verification runs the same channel on two real publishers long enough to measure
+   duplicate media convergence, hash divergence, missing objects, and backfill behavior in Grafana.
+
+The goal is not just "two publishers are online." Success requires elapsed production time behind the
+numbers and dashboards that show duplicate hits, misses, and archive repair.
+
+## Consequences
+
+Manifest ids cannot be used as the archive dedupe key for duplicate publishers. Operators get a
+clear signal when two publishers produce identical bytes versus merely announcing the same channel.
+If encoder determinism changes, the single-node test fails before production redundancy silently
+degrades.
+
+## Alternatives considered
+
+- Dedupe by manifest id. This preserves envelope identity but misses the resilience property because
+  duplicate publishers necessarily produce different envelopes.
+- Dedupe by logical channel and time only. This can hide encoder divergence and promote bad
+  redundancy before byte-level media equality is proven.
+- Disable duplicate publishers until the scheduler is perfect. This avoids conflict handling but
+  weakens live resilience and leaves the archive data layer untested.
+
+## Rollout/teardown
+
+Roll forward by landing the local deterministic test, adding miss/duplicate metrics to the archive
+scrape surface, then running two publishers for one logical channel in production. Roll back by
+disabling duplicate scheduling for that channel; existing content-addressed archive objects remain
+valid.
+
+## Implementation notes
+
+The node-agent archive scrape now exposes duplicate-source and miss gauges without placing hashes in
+labels. Per node, role, broadcast, rendition, and track it reports duplicate matching hash sources,
+duplicate hash sequences, divergent hash sequences, and missing hash records. Grafana shows those
+next to archive ladder coverage so the production duplicate-publisher run has an operator-visible
+convergence and miss signal.
+
+`ec-node archive-convergence` is the primary proof surface for duplicate media identity. It compares
+named archive manifest roots directly inside the Rust node binary, groups records by logical stream,
+rendition, track, and sequence, and only returns `ok` when every expected sequence has matching
+duplicate source hashes with no missing or divergent sequence. It also requires archive records to
+carry at least two distinct `source_node` values, so mirrored global-origin manifests cannot pass as
+independent publishers. This keeps the media-data invariant in the already-shipped Rust artifact
+instead of extending the Python node-agent. Rollout gates should use
+`ec-node archive-convergence --require-ok`; the command emits the JSON report either way, but
+`--require-ok` exits non-zero unless duplicate convergence is actually proven.
+`ec-node archive-convergence --prometheus` renders the same Rust convergence report as scrapeable
+`every_channel_archive_*` gauges for duplicate source records, duplicate sequences, divergent
+sequences, source-local divergence, missing hashes, missing source identity, media timing conflicts,
+record source count, and pass/fail state. This gives Grafana a Rust-owned proof metric path while
+the older node-agent ladder metrics remain available during migration.
+`ec-node archive-convergence-serve` keeps that proof path live for Prometheus: it serves `/health`
+and `/metrics`, recomputes convergence on each scrape, and emits `scrape_ok=0` metrics instead of
+disappearing when manifests are missing or not ready. Production Grafana can therefore distinguish a
+healthy metrics target from an unproven duplicate-publisher run.
+The Nix `services.every-channel.ec-node.archive.convergence.proofs` option turns those Rust proof
+servers into named systemd units. Each proof must name at least two `NAME=PATH` sources and gets a
+dedicated listen address, so operators can add one Prometheus scrape target per duplicate channel
+without resurrecting the Python node-agent as the proof oracle.
+Forge enables an initial `la-kcop-publisher-origin` proof target on `127.0.0.1:7812` and Prometheus
+scrapes it alongside the other local every.channel targets. Until two real publisher manifest roots
+are mounted or fetched into Forge, the target intentionally uses the Forge manifest root as a
+placeholder peer and must report unproven convergence rather than green duplicate-publisher proof.
+Forge also exposes a static two-NUC `la-kcet-remote-publisher-origin` proof target once that channel
+is the live converged duplicate sample. Dynamic Headscale file-SD remains useful for discovery, but
+it can include relays and stale nodes; duplicate-publisher proof should use an explicit publisher
+pair or future scheduler group labels so unrelated agents do not turn a passing channel red.
+This static proof exports its own Rust convergence gauges rather than gating on broad legacy
+Prometheus aggregates, because older node-agent archive metrics do not yet carry enough proof-role
+labels to avoid summing stale divergence from unrelated scrape targets.
+
+`ec-node archive-convergence-measure` is the primary production proof harness. It fetches named
+node-agent `/v1/archive-manifest` samples or direct manifest JSONL URLs, writes bounded temporary
+manifest roots, reuses the Rust `archive-convergence` report, and optionally queries Prometheus for
+the Grafana-facing duplicate/miss series. A production run only counts as complete when the report
+has elapsed samples, matching duplicate media hashes, zero divergent hash sequences, and live
+Prometheus series for the duplicate/miss gauges. The measurement groups records by archive record
+source identity, not by the URL used to fetch a manifest, and reports source identity failures when
+the sample is too weak to prove independent publisher data. The older
+`scripts/measure-duplicate-publishers.py` stays compatibility-only until live operators and Forge
+jobs are switched to the Rust command.
+The convergence report carries bounded divergent-sequence samples with per-source hash, byte size,
+receive time, source node/session, CAS path, and media timing when present, so a red proof is
+immediately actionable without fetching full manifests by hand.
+It also reports a non-blocking media-timing-missing count and Prometheus gauge; hash equality can
+still prove duplicate bytes, but missing timing means a divergent proof cannot yet classify whether
+the mismatch is a phase/windowing problem or an encoder byte problem.
+Publisher service builders must pass proof cadence explicitly. Both the node-agent publisher
+supervisor and Nix systemd publisher module set `--publisher-archive-segment-duration-ms` and
+`--publisher-start-boundary-ms` by default, so netbooted NUCs do not depend on stale hotpatch CLI
+defaults when aligning duplicate publisher proof windows.
+
+`ec-node archive-convergence-measure-serve` turns that production proof harness into a live
+Prometheus target. Each `/metrics` scrape fetches one fresh sample from node-agent or direct JSONL
+manifest URLs, keeps a bounded in-memory sample window, and only reports measurement `ok` after the
+configured elapsed window has passed. This avoids blocking Prometheus scrapes for the measurement
+duration while still preventing two immediate samples from looking like a real production run.
+The service emits measurement-level gauges for fetch success, source record counts, invalid records,
+elapsed seconds, Prometheus series presence, reasons, and then appends the same
+`every_channel_archive_*` convergence gauges from the latest sample. The service can also read
+Prometheus file-SD JSON from Forge's Headscale node-agent discovery and turn each discovered target
+into a sampled node-agent manifest source. The Nix
+`services.every-channel.ec-node.archive.convergence.remoteProofs` option creates these remote proof
+services as systemd units from either static `NAME=URL` endpoints or dynamic file-SD inputs. Forge
+now exposes `la-kcop-remote-publisher-origin` on `127.0.0.1:7813` using the live
+`/var/lib/prometheus/every-channel-node-agents.json` inventory. It must stay red until that
+inventory contains at least two independent publisher node-agents whose `publisher.m4s` records
+converge.
+
+When archive-serve ports are not reachable from the proof runner, the node-agent exposes a bounded,
+tailnet-authenticated `/v1/archive-manifest` sample endpoint. The harness can use that endpoint for
+each named publisher, compare local manifest records directly, and still require at least two elapsed
+samples before declaring success.
+
+Production duplicate proof also requires archive-buffer freshness on each participating publisher.
+During mixed-generation rollouts, the current node-agent may supervise an older installed
+`archive-hot-sync` helper. The agent must probe helper flag support and omit optional arguments such
+as `--link-mode` when an older helper lacks them, because a silently failing archive-buffer sync can
+leave one publisher with healthy live streams but stale manifests.
+
+The publisher buffer refresh is freshness-first: the node-managed sync must mirror full manifests
+without origin object fetch before running the slower cache fill/prune pass. This lets convergence
+checks, Grafana scrape surfaces, and demand fetch see current BLAKE3 indexes even when proactive CAS
+object backfill is still catching up.
+
+`wt-archive` stamps each archive index record with `source_node` and `source_session`. The Nix
+archive launcher passes the runtime hostname as `--source-node`; explicit CLI users can override it.
+Older records without this identity continue to parse, but proof commands and production measurement
+mark them incomplete instead of accepting them as independent publisher evidence.
+
+Publisher-origin proof must be captured before relay/archive mirroring can collapse source identity.
+When node-agent archive buffering is enabled, supervised `wt-publish` processes pass
+`--publisher-archive-output-dir`, `--publisher-archive-manifest-dir`, and
+`--publisher-archive-source-node`. `wt-publish` now supervises the Rust
+`publisher-proof-archive-source` worker for that archive track. The worker splits the MPEG-TS source
+by source-clock windows, fresh-encodes each bounded window with the deterministic proof profile,
+stores the resulting media fragments under `publisher.m4s` in the same CAS/index format, and stamps
+them with node-agent source identity. The relay playback encoder remains continuous for watchability,
+but it is no longer the BLAKE3 data identity for duplicate-publisher proof. The source identity is
+explicit override first, then hostname plus a short hash of machine-id, with boot-id only as a
+fallback; hostname alone is not enough because publisher images can share names like `ec-node`.
+Production duplicate verification can therefore compare `publisher.m4s` from two publisher buffers
+without treating copied relay-origin manifests as independent sources.
+
+Proof tooling defaults to `publisher.m4s`. The relay video track `0.m4s` is useful playback data,
+but it is not duplicate-publisher proof: a publisher buffer may hold relay/cache records on `0.m4s`
+that have no publisher source identity. Production convergence checks that sample `0.m4s` should be
+treated as playback/archive-cache diagnostics, not byte-for-byte duplicate publisher evidence.
+
+The first live publisher-origin measurements on 2026-06-08 showed correct distinct source labels but
+zero matching duplicate sequences for `la-nbc4`, `la-pbs-socal`, and `la-kcet`. The failure is
+useful: independent `wt-publish` processes currently start their fragment sequence and encoder chunk
+phase at local process start, so sequence `0` from two publishers is not necessarily the same
+broadcast moment. Duplicate-publisher proof therefore requires a shared chunk clock or
+scheduler-controlled aligned encoder phase before byte-for-byte archive convergence can pass in
+production.
+
+Publisher-origin `publisher.m4s` records now require timed fMP4 fragments for global proof and map
+those fragments onto observed wall-clock epoch buckets instead of local process counters. The Rust
+writer learns track timescales from the init `moov` box, reads fragment
+`moof/traf/tfhd+tfdt` decode timestamps to reject untimed proof when possible, then assigns
+`group_sequence = observed_epoch_bucket * bucket_stride + fragment_slot`. Fragments that lack usable
+timing still fall back to the previous local counter so publishing does not fail hard on malformed
+metadata, but duplicate-publisher proof should use timed fragments. The `wt-publish` ffmpeg path
+also preserves source timestamps and uses closed-GOP, single-threaded x264 settings with forced
+keyframe cadence so independent publishers have a real chance of producing identical bytes for the
+same media time window.
+
+A later live run on 2026-06-08 found a stricter local invariant before cross-publisher byte equality:
+each publisher must produce at most one hash for a given `source_node` and `group_sequence`.
+Production `publisher.m4s` samples for `la-kcop` and `la-ktla` showed multiple hashes from the same
+source in the same sequence bucket because real fMP4 fragments can arrive faster than the configured
+proof segment duration, and the writer rounded decode time into repeated buckets. The writer now
+uses a fixed per-epoch bucket stride and increments an in-bucket fragment slot when multiple timed
+fragments arrive inside the same proof duration. This keeps source-local manifests unique while
+allowing independently restarted publishers to align on the same observed wall-clock bucket.
+`ec-node archive-convergence` reports this separately as `source_local_divergent_sequences` so
+operator tooling can distinguish a self-contradicting publisher from two publishers that simply
+disagree about the same sequence.
+Because bucket-strided proof sequences intentionally leave numeric gaps, archive convergence uses
+the observed sparse sequence union for publisher-origin manifests. Dense contiguous sequence ranges
+remain available in the simulation layer when a model explicitly expects every integer sequence.
+
+The 2026-06-08 live `la-kcet/publisher.m4s` sample from Forge confirmed that both publishers now
+emit distinct source identities (`ec-node-c3546fa5abc3` and `ec-node-72cf1c3aa196`) with no missing
+source identity records on the sampled publisher-origin manifests. It also confirmed the remaining
+bug: 156 shared publisher-origin sequences had zero byte-for-byte BLAKE3 matches and 156 divergent
+hashes. The next production fix must align the publisher chunk clock and encoded fMP4 byte stream,
+not merely improve scrape or Grafana plumbing.
+
+After the wall-clock bucket hotpatch, the same live proof no longer has fake sparse-range missing
+IDs: `la-kcet/publisher.m4s` reported 376 observed proof sequences, zero missing source identities,
+zero source-local divergent sequences, and 234 divergent shared sequences. A byte-level sample for
+sequence `7287381184512` had different sizes, different BLAKE3 hashes, different `tfdt`
+base-media-decode-times (`210210` versus `0`), and different `mdat` payload prefixes. Across that
+sampled window there were zero common fragment hashes even when sequence IDs were ignored, proving
+that the remaining failure was independent-encoder media phase and fMP4 payload determinism, not an
+archive manifest identity bug.
+
+A later `la-kcop/publisher.m4s` sample exposed a stricter live-source bug: source-window proof
+records were using unsynced MPEG-TS PCR chunk indexes as `group_sequence` when the OTA UTC clock was
+unavailable, causing restart-dependent jumps such as 93M, 135M, 341M, and 390M. The source-proof
+writer now uses the chunk UTC start only when the chopper reports synced timing, otherwise it falls
+back to the local wall-clock window start, and rewrites fMP4 `tfdt` onto that shared window before
+hashing. The live HTTP proof worker also retries transient source opens/reader failures in unbounded
+live mode, so a tuner `503` or malformed TS burst is skipped/retried instead of killing the
+publisher proof process.
+
+The synced source-window clock must use the chopper's exact global chunk index, not integer UTC
+seconds. A 1001 ms proof cadence makes whole-second UTC start metadata lossy: adjacent source
+windows can share the same `utc_start_unix`, which caused one publisher to write several different
+hashes under the same source-local `group_sequence`. Synced chunks therefore use
+`ChunkTiming.chunk_index` directly; only unsynced chunks fall back to local wall-clock receipt.
+The live source-window proof writer also keeps subfragment slot allocation as stream state instead
+of per-chunk state. Real source windows can be emitted in more than one proof chunk for the same
+media timing sequence; resetting the slot counter for every chunk reused the same
+`group_sequence` and made one healthy publisher look self-divergent. The counter is bounded so the
+long-running live worker does not grow state unbounded.
+
+`wt-publish` now has an explicit Unix-epoch start boundary, defaulting to the publisher-origin proof
+cadence. After relay setup and immediately before spawning ffmpeg it waits until the next boundary,
+so a newly restarted duplicate publisher starts its forced-keyframe clock on the same global cadence
+as already-running publishers.
+This does not by itself prove byte equality; it removes the local-process-start phase error from the
+live publisher path and gives rollout measurement a deterministic knob (`--publisher-start-boundary-ms
+0` disables it). The live ffmpeg argument plan is factored into a Rust unit-testable helper so
+future timestamp/keyframe changes are pinned in `ec-node` instead of being inferred from node-agent
+process strings or production samples.
+
+The first post-start-clock live sample still failed duplicate byte identity: both publishers landed
+in the same wall-clock proof bucket, but one fragment carried `tfdt=390390` while the other carried
+`tfdt=30030`, matching the staggered restart gap. Their `mdat` prefixes differed too, which means a
+continuous x264 encoder keeps enough local history that a later restart cannot prove byte equality
+merely by joining the same wall-clock cadence. The live profile therefore enables x264
+`stitchable=1` alongside closed GOP, no scenecut, no B-frames, no lookahead, and one thread. If that
+still does not converge in production, the next fix is a deliberately stateless per-fragment encode
+or a Rust-owned media clock/segmenter that resets encoder history at each proof boundary.
+
+The follow-up production hotpatch moved the start-boundary wait to immediately before ffmpeg spawn,
+enabled `stitchable=1`, and restarted both publisher nodes in the same batch. The latest `la-kcet`
+sample still reported zero matching duplicate hashes with no missing source identity and no
+source-local divergence. A final sampled shared sequence differed by hundreds of milliseconds of
+receive time and by media size (`439737` versus `270283` bytes for the video fragment), so the
+remaining mismatch is not just MP4 timestamp metadata. Production duplicate proof now needs a
+stateless fragment boundary: either encode each proof segment from the same bounded source window
+with fresh encoder state, or make the Rust media pipeline own exact frame-window capture before
+calling ffmpeg/x264.
+
+Archive manifests now carry optional fMP4 media timing for publisher-origin fragments. The
+`archive-convergence` gate treats equal archive group sequence IDs with different media sequence or
+decode-time metadata as `media_sequence_conflict`, even if the byte hash happens to match. This keeps
+production proof aligned with the Rust simulation model: a duplicate publisher only proves the same
+broadcast moment when the archive sequence and media window agree.
+
+The first stateless proof primitives are now in `ec-node`. `publisher-proof-segment` takes one
+bounded MPEG-TS source-clock window, runs a fresh deterministic x264/AAC fMP4 encode, splits the
+result into init bytes and media fragments, and emits BLAKE3 hashes for each. `publisher-proof-windows`
+uses the Rust MPEG-TS source-clock splitter first, then fresh-encodes each bounded window and reports
+per-window source TS, init, and media hashes. Proof windows carry explicit MPEG-TS decoder context
+with `--preroll-packets`, defaulting to the repo-owned `WT_PUBLISH_PROOF_PREROLL_PACKETS` budget, so
+mid-GOP windows do not silently depend on best-effort decoder recovery. Focused Rust tests
+fresh-encode the same bounded input and the same finite source-window campaign twice and assert
+byte-for-byte identical proof hashes.
+
+`publisher-proof-duplicates` is the single-node duplicate-publisher gate for the stateless path. It
+runs `publisher-proof-windows` independently under at least two publisher identity labels, defaults
+to `publisher-a` and `publisher-b`, and compares source TS, init, and media fragment BLAKE3 hashes
+for every source-clock window. `--require-ok` exits non-zero unless every compared window matches,
+and duplicate publisher labels are rejected so the proof cannot accidentally collapse to one source
+identity. `publisher-proof-compare` is the cross-machine stateless proof gate: each publisher can run
+`publisher-proof-windows` against the same bounded source TS file locally, copy the JSON report back
+to the operator host, and compare the reports by named publisher. It rejects mismatched chunk cadence,
+missing windows, source TS hash mismatches, init hash mismatches, media fragment hash mismatches, and
+empty media windows.
+
+`publisher-proof-remote-compare` is the production operator harness for that cross-machine gate. It
+copies one bounded `.ts` proof input to each named SSH target, runs `ec-node publisher-proof-windows`
+on the target, stores each returned JSON report under the local output directory, writes a
+`compare.json`, and returns the existing compare report with upload/proof timing. Remote labels use
+the same single-component validation as publisher identities, remote proof roots are constrained to
+`/tmp/every-channel-*`, and cleanup is opt-in so the generated proof files remain inspectable unless
+the operator explicitly requests removal. This keeps the live proof path in Rust without making the
+Python node-agent a new oracle. It proves the machine/runtime/compiler boundary without requiring
+the two NUCs to share a live tuner at the exact same instant.
+
+`publisher-proof-archive-source` is the live archive implementation of the same proof model. It can
+read local source files directly, read plain HTTP MPEG-TS bodies directly for HDHomeRun-style
+sources, or fall back to an ffmpeg MPEG-TS copy reader for other inputs. Each emitted source-clock
+window is encoded with fresh proof state, archived as CAS-backed `publisher.m4s` records, and mapped
+to source-clock group sequences with explicit media timing metadata. A focused Rust regression now
+archives the same bounded TS input as two source nodes, then runs `archive-convergence` against the
+two manifest roots and requires full duplicate convergence with zero divergent or source-local
+divergent sequences.
+
+Forge `ci-gates` now runs the `publisher_proof` and `archive_convergence` Rust filters before the
+distributed simulator campaign, so single-node byte-for-byte determinism, source-window archive
+proof semantics, and duplicate archive convergence are checked before production rollout evidence is
+considered. The next production step is to deploy the updated node binary and let fresh
+`publisher.m4s` source-window records age into the Grafana scrape window so live duplicate metrics
+can replace the older continuous-encoder divergence.
--- a/evolution/proposals/ECP-0157-rust-simulation-testing.md
+++ b/evolution/proposals/ECP-0157-rust-simulation-testing.md
@ -0,0 +1,158 @@
+# ECP-0157: Rust Simulation Testing
+
+Status: Draft
+
+## Context
+
+Production is now fast enough to expose distributed bugs quickly, but it is still the wrong first
+place to discover scheduler, archive, and duplicate-publisher invariants. The Python node-agent also
+made this worse by putting core control behavior outside the already-built Rust node binary.
+
+## Decision
+
+Add a small deterministic simulation layer in `ec-core` and use it for distributed media invariants:
+
+- `ec-node` remains the runtime owner for node behavior.
+- Tests model logical time, delayed delivery, backfill, duplicate publishers, and archive
+  convergence in Rust.
+- Simulation scenarios are seed-replayable and include deterministic jitter, transient drops,
+  partition windows, publisher outage/restart windows, backfill retries, and encoder drift faults.
+- A failing simulation must print or carry a replay hint so the exact schedule can be rerun.
+- Simulation reports include deterministic execution history so a failure has an ordered event trace,
+  not only a final assertion.
+- Simulation campaigns run many seed schedules in one fast test and preserve the first failing seed,
+  invariant report, and final state as the failure artifact.
+- Campaign execution has a reusable seeded runner so new models can share replay/failure accounting
+  instead of copying bespoke loops.
+- First failures are automatically shrunk where the model supports it. For duplicate publishers the
+  shrinker removes irrelevant partitions, publisher outages, timing jitter, transient drops, and
+  excess media sequence range while keeping the original invariant unchanged.
+- Invariants are explicit checks, not implicit test prose: duplicate source count, missing
+  sequences, divergent hashes, missing media timing, conflicting media timing, complete duplicate
+  coverage, and convergence-deadline budgets.
+- Media identity is checked by BLAKE3 hashes for stream, rendition, track, sequence, profile, and
+  source-material identity.
+- Media timing is part of the proof model. Matching hashes are not considered a complete duplicate
+  proof unless both publishers also expose a shared logical media clock for the chunk.
+- Source-material identity is separate from stream metadata. Two publishers can advertise the same
+  channel, sequence, timing, and encoder profile while still encoding different RF/source windows;
+  that must fail in simulation before production archive comparisons burn wall-clock time.
+- Publisher-origin archive `group_sequence` is derived from parsed media-time identity plus stable
+  track id, not local receive time. Receive time is telemetry; it is not proof that two publishers
+  archived the same broadcast moment.
+- Live publisher archive proof normalizes fMP4 `tfdt` to the Unix media slot before hashing a
+  fragment. The first fragment for each track anchors the process-local media clock to wall-clock
+  time; later fragments preserve ffmpeg's media cadence from that origin. ffmpeg still runs with
+  wall-clock timestamp input enabled where possible, but the Rust archive writer is the authority
+  for the proof clock when source MPEG-TS timestamps are process-relative.
+- Archive `group_sequence` includes a stable subfragment slot inside each `(track_id,
+  media_sequence)` pair, because audio can legitimately emit multiple fragments within one media
+  slot and those must compare in order instead of colliding as source-local divergences.
+- Duplicate-publisher scenarios model publisher content phase separately from advertised archive
+  sequence. A publisher that starts its local encoder at a different content phase must fail fast in
+  simulation, because production fragments with the same local sequence are not proof of the same
+  broadcast moment unless the chunk clock is shared.
+- `ec-node sim-duplicate-publishers` runs the same campaign model from the compiled Rust binary and
+  emits JSON suitable for CI artifacts and rollout gates.
+- `ec-node sim-duplicate-publishers --failure-artifact <path>` writes the first failing campaign as
+  a replayable JSON artifact with the shrunk scenario, invariant report, event trace, shrink steps,
+  and a command hint for replaying `replay_scenario` through `--scenario-json -`.
+- `ec-node sim-duplicate-publishers --scenario-json <path-or->` replays an exact serialized
+  `DuplicatePublisherScenario`, so a shrunk failure from CI or production investigation can be rerun
+  without reconstructing command-line flags.
+- `ec-node sim-duplicate-publishers` can inject timing faults directly with
+  `--missing-media-timing-publisher NODE` and `--publisher-media-time-offset NODE:OFFSET_MS`, so
+  the current production proof class can be reproduced without hand-writing scenario JSON.
+- `ec-node sim-duplicate-publishers` and `ec-node sim-system` can inject source-window faults with
+  `--publisher-source-material NODE:MATERIAL_ID`. Any campaign with multiple source-material ids
+  reports source-material mismatch observations instead of leaving operators to infer that class
+  from divergent hashes.
+- `ec-node archive-convergence` reads existing archive manifest JSONL and applies the same
+  convergence semantics to real duplicate publisher outputs.
+- Control-plane simulation models logical nodes, seeded gossip fanout, delivery jitter, transient
+  drops, node-specific partitions, node outages, duplicate deliveries, and propagation deadlines.
+- `ec-node sim-control-plane` runs the control-plane model from the compiled Rust binary and emits
+  replayable JSON with the first failing seed, scenario, invariant report, and ordered trace.
+- Control-plane campaign reports track max propagation time, max delivery time, dropped messages,
+  partition-delayed messages, outage-delayed messages, and duplicate messages, so prod rollout
+  measurements have a fast simulation baseline.
+- System simulation composes control-plane propagation with duplicate-publisher media production.
+  Control gossip produces per-publisher activation times; the media workload then proves that delayed
+  schedule propagation still converges when publishers use the global media sequence clock and fails
+  when they derive chunk identity from local activation time.
+- `ec-node sim-system` runs that composed workload from the deployed node binary. Its default
+  campaign models the current publisher topology class and can switch `--sequence-clock` between
+  `global` and `local-activation` to reproduce the exact class of duplicate-publisher phase bug
+  before waiting for production samples.
+- `ec-node sim-system --fault-profile foundationdb` uses a FoundationDB-style fault profile: each
+  seed generates a different but replayable cluster schedule with randomized control partitions, node
+  outages, transient gossip drops, duplicate messages, media partitions, publisher outages, and
+  archive backfill pressure.
+- The FoundationDB-style profile must also have an explicit negative regression for
+  `local-activation` sequence clocks, so the model proves the current production failure class is
+  caught in Rust before any rollout waits for live fragments.
+- `ec-node sim-system --failure-artifact <path>` writes the first failing composed system schedule
+  as replayable JSON, including the exact control/media scenario, invariant report, ordered trace,
+  and command hint for rerunning `--scenario-json -`.
+- System campaign reports must include fault coverage counters, not just pass/fail. A fast campaign
+  is only useful if it proves that the simulated run actually exercised the failure modes operators
+  care about.
+- System campaign reports also aggregate publisher phase-offset observations. A production-like
+  divergence caused by local activation clocks should identify itself as a phase bug in the campaign
+  JSON instead of requiring operators to infer that only from divergent hashes.
+- System campaign reports also aggregate source-material mismatch observations. A production-like
+  divergence caused by independent tuner/source windows should identify itself as a source-material
+  bug in the campaign JSON instead of being confused with codec nondeterminism.
+- System and duplicate-publisher reports aggregate missing media-timing records and media-timing
+  conflicts, so the live failure class where fragments arrive without a usable media clock is visible
+  in fast Rust simulation output.
+- FoundationDB-profile `sim-system` campaigns require that coverage by default: control transient
+  drops, partition delays, node outage delays, duplicate messages, media transient drops, media
+  partition delays, publisher outages, backfill, and observed convergence timing must all appear in
+  the campaign report. A campaign that passes invariants but misses these classes is reported as a
+  weak simulation, not a green rollout gate.
+- FoundationDB-profile coverage is breadth-gated, not only boolean-gated. By default at least
+  `max(2, iterations / 32)` seeds must exercise every required distributed fault class; operators
+  can raise that floor with `--min-fault-seed-coverage` for longer scientific campaigns.
+- Campaign reports track both event totals and seed counts per fault class, plus a bounded list of
+  the slowest system schedules with replay hints. This makes green runs inspectable: operators can
+  see how broadly the randomized schedule space was exercised and which seeds define the current
+  latency tail.
+- System campaign reports also aggregate deterministic simulated convergence time and trace event
+  counts. `ec-node sim-system` stamps wall-clock execution telemetry around the campaign so a run
+  reports iterations per second, simulated system seconds per wall second, and trace events per
+  second without putting wall-clock data into the replayed scenario itself.
+- `sim-system --failure-artifact <path>` writes an artifact for weak coverage as well as invariant
+  failures, so CI can preserve evidence when a campaign was too small or too narrow to exercise the
+  required distributed faults.
+- Forge `ci-gates` runs the Rust system simulator tests and a 1024-seed
+  `sim-system --fault-profile foundationdb` campaign from the compiled `ec-node` binary before web
+  build/deploy gates. This keeps the fast randomized check ahead of production rollout evidence.
+- Simulation failures must be actionable before any matching production rollout is considered
+  healthy.
+
+## Consequences
+
+We get FoundationDB-style pressure in a much smaller shape: many deterministic failure schedules can
+run as normal Rust tests without booting machines. The first media model covers duplicate publisher
+convergence, network partitions, transient loss, publisher restart/backfill, convergence latency,
+encoder drift, and publisher phase alignment, and the first runtime command applies it to archive
+manifests. The first control model covers gossip propagation across relays and nodes under dropped,
+delayed, duplicated, partitioned, and outage-delayed control messages. The shrink/replay path makes
+supported failures small enough to debug before they become production event archaeology; exact
+scenario JSON is the replay contract. Later models can add tuner scheduling, relay cache eviction,
+and image rollout state machines. The composed system model is the first workload-level step: it
+checks the boundary between control-plane speed and media determinism, which is where production
+duplicate publishers are currently most fragile.
+
+## Alternatives considered
+
+- Keep writing production probes only. Rejected because probes prove what happened once, not what
+  should happen across many fault schedules.
+- Extend the Python node-agent as the simulation oracle. Rejected because the image should get
+  thinner and the runtime behavior belongs in the Rust node.
+
+## Rollout/teardown
+
+Roll forward by adding simulation tests next to each new distributed invariant. Roll back by keeping
+the production probes; the simulation module is library-only and has no runtime service impact.