Service

Pipeline

Outsource compute. Keep the understanding.

We are scoping a bring-your-own-model compression service. The principle is simple: you keep the weights, the evals, and the reasoning; we return a smaller model that decides the same way.

Currently taking conversations. We will publish a fixed-scope offering once we have run it end-to-end with three teams.

How it works

Three stages. Two weeks of calendar time, mostly compute. You keep everything at every step.

  1. 01

    Intake

    You send the FP32 checkpoint and the workload that matters. We read the architecture, profile sensitivity per layer, and propose a compression schedule. NDA first, weights second.

  2. 02

    Compression

    Pruning, quantization, and selective distillation run against your eval. Every cut is logged. Every restore path is preserved. We do not delete sources.

  3. 03

    Handoff

    You get a quantized checkpoint, a signed eval suite, a model card, and the calibration set used. Everything reproducible from one release hash. If you walk away, you walk away with everything.

What you receive

  • Quantized weights (INT8 or per-layer mixed precision).
  • The frozen eval suite we ran against, plus the seed.
  • A model card with per-layer sensitivity and accuracy delta vs your FP32 baseline.
  • A deployment guide for the target hardware you specified.
  • A release hash that reproduces all of the above from the source you sent.

Reach out

If you are shipping a model into production and the inference cost is becoming a problem, write us. Lead with the model and the target hardware.

[email protected]

↑↓ navigate · enter open · esc close