Compression is cognition.

Cognitive cores small enough to inspect, sharp enough to ship. Drop the trivia, keep the thinking.

The step is the unit.

Progress is discrete. We build for the jumps after they happen.


    Q4 2024
    ████████████████████████
    100B params · frontier
  
    Q2 2025
    ███████████·············
     10B params · workstation
  
    Q4 2025
    ████····················
      1B params · cognitive core  ← where the work is now
  
    Q2 2026
    ██······················
    0.3B params · next step

Frontier sizes have shrunk in steps, not slopes ← where we are building

Four pillars.

Compression is not one technique. It is four, in sequence.

Pruning

Remove what the model never thinks with.

Structured sparsity at the head and channel level.
Calibration on the workload, not a public set.
Recoverable: every cut logged and reversible.

Quantization

Lower the bits, hold the geometry.

Mixed-precision per layer, chosen by sensitivity.
Integer weights with float activations where it matters.
Kernels that respect cache lines on commodity silicon.

Distillation

Teach the spike, skip the encyclopedia.

Behavior matched on rollouts, not just logits.
Curriculum derived from the task graph.
Smaller student, sharper edge, same decisions.

Verifiability harness

If a step is real, you can prove it cold.

Frozen evals on every commit.
Drift, regression, and refusal tracked per release.
Reports reproducible from a single hash.

All the spike. None of the encyclopedia.

Capability = Compression × Calibration × Conviction .

Capability: what the model can decide cold
Compression: how small the decision-making fits
Calibration: how the cuts match real traffic
Conviction: how the eval is signed

Capability = Compression × Calibration × Conviction.

About