Squish — Andy Dixon

Every token you send to a model costs money and eats into the context window. A lot of what goes into a prompt is filler — repeated phrasing, low-salience sentences, boilerplate the model does not need. Squish removes it deterministically, so the same input always produces the same shorter output.

Token cost

Lower

No inference

Pure

Same in, same out

Deterministic

仕組み · the pipeline

Four stages, no model in the loop.

Compression runs as a single pure transformation. There is no API key to supply and nothing leaves the process — content flows through four stages and comes out smaller:

content → extract → synthesise → prioritise → adapt → compressed

抽出 · extract

Spike extractor

Scores every sentence by keyword salience and keeps the ones carrying the most meaning, dropping the low-signal filler around them.

統合 · synthesise

Context synthesiser

Removes duplication, shortens common phrases and abbreviates — tightening the wording without changing what it says.

優先 · prioritise

Token prioritiser

Enforces a maxTokens budget, trimming the lowest-priority content first so the result fits inside the limit you set.

適応 · adapt

Multi-AI adapter

Adds model-specific framing for the target — Claude, GPT or Cursor — so the compressed prompt arrives in a shape each model reads well.

調整 · the controls

Compression level

conservative, balanced or aggressive — how hard to push, from a gentle trim to a deep squeeze.

Target model

generic, claude, gpt or cursor — tailors the framing of the output to where the prompt is headed.

Token budget

A maxTokens ceiling the result is held under, so a compressed prompt always fits the window you have left.

Each run reports live metrics: compression percentage, tokens saved, an estimated speedup and a quality score — so you can see the trade-off, not just take it on trust.

接続 · web app & API

A single page to paste into, or a JSON API to call.

Open the web UI, paste some content, pick a level, model and budget, and watch the compressed output appear with its metrics, a copy button and a ready-to-run curl command. Or skip the UI and hit the API directly — every response is JSON with a success flag.

Compress a single prompt with POST /api/optimise, batch up to fifty at once with POST /api/batch, or use the optimise-for-ai shorthand to target a model in one line. GET /api/stats lists the supported models and levels; /health is there for liveness checks.

連絡 · free to use

Free to use — go and squish something.

Squish is handy anywhere prompts get long and tokens add up — RAG context, agent scaffolding, batch jobs. The web app and API are free to use; the compression engine is closed-source. Open it up and try it on your own content.

squish.dixon.cx

Questions, or help integrating it?

Anything about the approach, the API, or wiring it into a pipeline — drop me a line directly.

Send me a message →