§ I. On consensus.
A finding flagged by a single model proves little. The same finding, reproduced independently by seven, is evidence. Our scoring table rewards reproduction and discounts noise.
| Severity | 1 model | 2–3 | 4–6 | 7+ |
|---|---|---|---|---|
| CRITICAL | 12 | 24 | 40 | 60 |
| HIGH | 6 | 12 | 20 | 30 |
| MEDIUM | 3 | 6 | 10 | 15 |
| LOW | 1 | 2 | 3 | 5 |
| INFO | 0 | 0 | 0 | 0 |
§ II. On the panel.
We commission ten contemporary frontier models for each audit, drawn from independent labs across multiple jurisdictions. Diversity of provenance — different training corpora, different architectures, different national contexts — is what makes the consensus meaningful. The roster is refreshed as new state-of-the-art models are released; each report names the exact panel that signed it.
| Model | Laboratory | Origin | Class | Released |
|---|---|---|---|---|
| Grok 4.3 | xAI | United States | Frontier | Mar 2026 |
| Claude Opus 4.7 newly seated | Anthropic | United States | Frontier | Apr 2026 |
| Gemini 3.1 Pro | Google DeepMind | United States | Frontier | Feb 2026 |
| GPT-5.5 Pro newly seated | OpenAI | United States | Frontier | Apr 2026 |
| Llama 4 Maverick | Meta | United States | Open weights | Jan 2026 |
| Qwen 3.6 Plus | Alibaba | China | Frontier | Mar 2026 |
| MiniMax M2.7 | MiniMax | China | Frontier | Feb 2026 |
| Kimi K2.6 | Moonshot AI | China | Frontier | Mar 2026 |
| Codestral 2508 | Mistral | France | Specialist | Aug 2025 |
| DeepSeek V4 Pro newly seated | DeepSeek | China | Frontier | Apr 2026 |
§ III. On the scorer.
A separate model — the scorer — adjudicates the ten reports. It is never one of the panel. Its prompt instructs it to dispute single-model findings, weight consensus findings, and produce the executive summary.
§ IV. On reproducibility.
Every report carries the combined SHA-256 of the source as audited. Anyone may re-run the audit against the same hash; the panel and prompts are documented per-report.