Document M-001 · The Method

The Method

On how a single number — a score from zero to one hundred — is rendered defensible.


§ I.   On consensus.

A finding flagged by a single model proves little. The same finding, reproduced independently by seven, is evidence. Our scoring table rewards reproduction and discounts noise.

Schedule A — Penalty by Severity × Consensus
Severity 1 model 2–3 4–6 7+
CRITICAL 12 24 40 60
HIGH 6 12 20 30
MEDIUM 3 6 10 15
LOW 1 2 3 5
INFO 0 0 0 0

§ II.   On the panel.

We commission ten contemporary frontier models for each audit, drawn from independent labs across multiple jurisdictions. Diversity of provenance — different training corpora, different architectures, different national contexts — is what makes the consensus meaningful. The roster is refreshed as new state-of-the-art models are released; each report names the exact panel that signed it.

Schedule B — Standard Panel
EFFECTIVE APR 2026
Model Laboratory Origin Class Released
Grok 4.3 xAI United States Frontier Mar 2026
Claude Opus 4.7 newly seated Anthropic United States Frontier Apr 2026
Gemini 3.1 Pro Google DeepMind United States Frontier Feb 2026
GPT-5.5 Pro newly seated OpenAI United States Frontier Apr 2026
Llama 4 Maverick Meta United States Open weights Jan 2026
Qwen 3.6 Plus Alibaba China Frontier Mar 2026
MiniMax M2.7 MiniMax China Frontier Feb 2026
Kimi K2.6 Moonshot AI China Frontier Mar 2026
Codestral 2508 Mistral France Specialist Aug 2025
DeepSeek V4 Pro newly seated DeepSeek China Frontier Apr 2026
The panel is reviewed quarterly. Models are seated when they reach state-of-the-art on independent benchmarks, and retired when superseded.

§ III.   On the scorer.

A separate model — the scorer — adjudicates the ten reports. It is never one of the panel. Its prompt instructs it to dispute single-model findings, weight consensus findings, and produce the executive summary.

§ IV.   On reproducibility.

Every report carries the combined SHA-256 of the source as audited. Anyone may re-run the audit against the same hash; the panel and prompts are documented per-report.


DOC-M-001 · REV 2.1 EFFECTIVE 2026-01-01