Every token you send to a model costs money and eats into the context window. A lot of what goes into a prompt is filler — repeated phrasing, low-salience sentences, boilerplate the model does not need. Squish removes it deterministically, so the same input always produces the same shorter output.
Lower
Pure
Deterministic
Four stages, no model in the loop.
Compression runs as a single pure transformation. There is no API key to supply and nothing leaves the process — content flows through four stages and comes out smaller:
content → extract → synthesise → prioritise → adapt → compressed
Spike extractor
Scores every sentence by keyword salience and keeps the ones carrying the most meaning, dropping the low-signal filler around them.
Context synthesiser
Removes duplication, shortens common phrases and abbreviates — tightening the wording without changing what it says.
Token prioritiser
Enforces a maxTokens budget, trimming the lowest-priority content first so the result fits inside the limit you set.
Multi-AI adapter
Adds model-specific framing for the target — Claude, GPT or Cursor — so the compressed prompt arrives in a shape each model reads well.
Compression level
conservative, balanced or aggressive — how hard to push, from a gentle trim to a deep squeeze.
Target model
generic, claude, gpt or cursor — tailors the framing of the output to where the prompt is headed.
Token budget
A maxTokens ceiling the result is held under, so a compressed prompt always fits the window you have left.
Each run reports live metrics: compression percentage, tokens saved, an estimated speedup and a quality score — so you can see the trade-off, not just take it on trust.
A single page to paste into, or a JSON API to call.
Open the web UI, paste some content, pick a level, model and budget, and watch the compressed output appear with its metrics, a copy button and a ready-to-run curl command. Or skip the UI and hit the API directly — every response is JSON with a success flag.
Compress a single prompt with POST /api/optimise, batch up to fifty at once with POST /api/batch, or use the optimise-for-ai shorthand to target a model in one line. GET /api/stats lists the supported models and levels; /health is there for liveness checks.
Free to use — go and squish something.
Squish is handy anywhere prompts get long and tokens add up — RAG context, agent scaffolding, batch jobs. The web app and API are free to use; the compression engine is closed-source. Open it up and try it on your own content.
squish.dixon.cxQuestions, or help integrating it?
Anything about the approach, the API, or wiring it into a pipeline — drop me a line directly.
Send me a message →