SA
Securitor
Est. MMXXVI · Independent
Browse reports
Document M-001 · The Method

The Method

On how a single number — a score from zero to one hundred — is rendered defensible.

§ I
On consensus

A finding flagged by a single model proves little. The same finding, reproduced independently by seven, is evidence. Our scoring table rewards reproduction and discounts noise.

The final score begins at 100. Each confirmed finding deducts a penalty determined by two axes: the severity of the finding and the number of independent models that flagged it. A single model reporting a critical vulnerability costs twelve points. Seven models reporting the same critical costs sixty. The gap is the method.

Schedule A — Penalty by Severity × Consensus
Severity 1 model 2–3 4–6 7+
Critical 12 24 40 60
High 6 12 20 30
Medium 3 6 10 15
Low 1 2 3 5
Info 0 0 0 0
§ II
On the panel

We commission ten contemporary frontier models for each audit, drawn from independent laboratories across multiple jurisdictions. No single vendor’s model audits code produced by that same vendor. The panel is constructed to maximise independence: different training data, different architectures, different institutional incentives.

Schedule B — Standard Panel
Model Laboratory Origin Class Released
Grok 4.3 xAI US Frontier Mar 2026
Claude Opus 4.7 Anthropic US Frontier Apr 2026
Gemini 3.1 Pro Google DeepMind US Frontier Feb 2026
GPT-5.5 Pro OpenAI US Frontier Apr 2026
Llama 4 Maverick Meta US Open weights Jan 2026
Qwen 3.6 Plus Alibaba China Frontier Mar 2026
MiniMax M2.7 MiniMax China Frontier Feb 2026
Kimi K2.6 Moonshot AI China Frontier Mar 2026
Codestral 2508 Mistral France Specialist Aug 2025
DeepSeek V4 Pro DeepSeek China Frontier Apr 2026

The panel is reviewed quarterly. Models are seated when they reach state-of-the-art on independent benchmarks, and retired when superseded.

§ III
On the scorer

A separate model — the scorer — adjudicates the ten reports. It reads all individual findings, identifies consensus, resolves conflicts, and computes the final score per the penalty table above. The scorer is never one of the panel. Its sole function is to weigh evidence, not to produce it.

§ IV
On reproducibility

Every report carries the combined SHA-256 of the source as audited. Anyone may re-run the audit against the same hash. The models may update, the findings may shift, but the source under examination is pinned and verifiable.