Pruning
Remove what the model never thinks with.
- Structured sparsity at the head and channel level.
- Calibration on the workload, not a public set.
- Recoverable: every cut logged and reversible.
Cognitive cores small enough to inspect, sharp enough to ship. Drop the trivia, keep the thinking.
Progress is discrete. We build for the jumps after they happen.
Q4 2024 Q2 2025 Q4 2025 Q2 2026
Compression is not one technique. It is four, in sequence.
Remove what the model never thinks with.
Lower the bits, hold the geometry.
Teach the spike, skip the encyclopedia.
If a step is real, you can prove it cold.
All the spike. None of the encyclopedia.
Capability = Compression × Calibration × Conviction .
Capability = Compression × Calibration × Conviction.