Seeded stochastic tests
Tests assert invariants on seeded Monte Carlo runs: percentile ordering, monotonicity, exact-zero self-delta, and medallion round-trips.
Trust
The product declares what is measured, inferred, assumed, excluded, or unpriced. That honesty is a sales asset.
Trust posture
Class1 does not sell certainty. It sells a disciplined estimate with a declared basis. The report shows the conditional risk band and the maturity class separately, because those are different uncertainties. A P90 number can be mathematically sharp and still depend on immature inputs; the buyer deserves to see both facts.
The product also refuses to confuse events with validation. Usage events help measure retry tails, fallback rates, output distributions, and demand spikes. Estimate-actual pairs validate forecasts. That distinction is what prevents a customer with lots of telemetry but few closed loops from claiming Class 1 maturity too early.
The trust page exists because the product will be used in merge decisions. If a gate can block a PR, the team needs to know how it behaves, when it does nothing, which assumptions are in scope, which data is frozen, and which open items remain outside the current product.
Tests assert invariants on seeded Monte Carlo runs: percentile ordering, monotonicity, exact-zero self-delta, and medallion round-trips.
Forecasts use frozen, effective-dated snapshots and past estimates are never silently rewritten. As the price or capability basis ages, estimate decay widens the accuracy band and steps the class back toward Class 5 (one step per staleness half-life) - Class1's extension of AACE classification, so confidence cannot outlive its basis.
The P90 is contingency: known-unknowns the engine can model from the diff and workload (retry storms, output drift, context growth, fallback, demand spikes). It is not management reserve for unknown-unknowns - future scope growth or new features re-enter only through the actuals loop. You approve against a conditional risk band, not a total-project allowance.
Ingest preserves provider events in raw_payload and keeps cached cost components disjoint.
Events measure factor shapes. Estimate class improves only from scarce estimate-actual pairs.
The policy gate cannot block a PR whose P90 delta is zero or negative.
GOAL.md and NEEDS_HUMAN.md keep L2/L3 items out of marketing fantasy.
Evidence checklist
Current open items
Proof command
GOAL checkboxes are not enough. A task counts only when the acceptance test exists and the suite is green.
make test
make demo
make capability
make footprint