Trust

Class1 sells control by refusing fake certainty.

The product declares what is measured, inferred, assumed, excluded, or unpriced. That honesty is a sales asset.

6,924 priced model rows effective 2026-06-06

7,389 model metadata rows drop-nothing spec sheet

34 coding grades SWE-Bench Verified . deduped

862 test functions 98 test files

103 Python modules engine, takeoff, ledger, organism

Trust posture

The fastest way to lose a technical buyer is to pretend early estimates are exact.

Class1 does not sell certainty. It sells a disciplined estimate with a declared basis. The report shows the conditional risk band and the maturity class separately, because those are different uncertainties. A P90 number can be mathematically sharp and still depend on immature inputs; the buyer deserves to see both facts.

The product also refuses to confuse events with validation. Usage events help measure retry tails, fallback rates, output distributions, and demand spikes. Estimate-actual pairs validate forecasts. That distinction is what prevents a customer with lots of telemetry but few closed loops from claiming Class 1 maturity too early.

The trust page exists because the product will be used in merge decisions. If a gate can block a PR, the team needs to know how it behaves, when it does nothing, which assumptions are in scope, which data is frozen, and which open items remain outside the current product.

Seeded stochastic tests

Tests assert invariants on seeded Monte Carlo runs: percentile ordering, monotonicity, exact-zero self-delta, and medallion round-trips.

Frozen basis, decaying class

Forecasts use frozen, effective-dated snapshots and past estimates are never silently rewritten. As the price or capability basis ages, estimate decay widens the accuracy band and steps the class back toward Class 5 (one step per staleness half-life) - Class1's extension of AACE classification, so confidence cannot outlive its basis.

Contingency, not management reserve

The P90 is contingency: known-unknowns the engine can model from the diff and workload (retry storms, output drift, context growth, fallback, demand spikes). It is not management reserve for unknown-unknowns - future scope growth or new features re-enter only through the actuals loop. You approve against a conditional risk band, not a total-project allowance.

Raw payload retention

Ingest preserves provider events in raw_payload and keeps cached cost components disjoint.

Events are not validation

Events measure factor shapes. Estimate class improves only from scarce estimate-actual pairs.

Non-increases do not block

The policy gate cannot block a PR whose P90 delta is zero or negative.

Open gaps are named

GOAL.md and NEEDS_HUMAN.md keep L2/L3 items out of marketing fantasy.

Evidence checklist

What a technical evaluator can verify locally.

Run the suiteThe repo uses tests as the oracle. Monte Carlo behaviour is tested through seeded invariants instead of fragile golden dollar values.

Read the architecturedocs/ARCHITECTURE.md explains the separation between takeoff, cost_engine, Blue Book, snapshots, and autobuild.

Inspect the policytakeoff/policy.py is pure and narrow: status is ok, warn, or fail based on P90 delta and the declared budget.

Open the snapshotsThe site numbers are generated from pricing, spec, capability, actuals, cloud, and structured-pricing snapshots in the repo.

Check the open itemsGOAL.md and NEEDS_HUMAN.md identify what is not done, especially L2 actuals and human/legal switches.

Current open items

What the site must not overclaim.

Variance waterfallneeds real post-merge actuals.

License-key gaterequires human key issuance and an asymmetric signing dependency.

Paste-a-diff web demostill listed as an L1 product-surface task.

General intelligence gradesLMArena/Artificial Analysis remain L2 due to access/scraping constraints.

Proof command

The suite is the oracle.

GOAL checkboxes are not enough. A task counts only when the acceptance test exists and the suite is green.

make test
make demo
make capability
make footprint