The Macro: Every LLM Application Is Paying for Tokens It Does Not Need
LLM costs scale with token count. Every extra word, every redundant paragraph, every verbose context window adds to the bill. For companies processing millions of LLM requests per day, the difference between a 5,000-token prompt and a 2,000-token prompt is tens of thousands of dollars per month.
The industry has been attacking this problem from the supply side: cheaper models, longer context windows, better caching. But almost nobody is attacking it from the demand side: making the prompts themselves more efficient.
Most LLM prompts are bloated. System prompts contain redundant instructions. Context windows include retrieved documents that are mostly irrelevant. Chat histories accumulate tokens from earlier in the conversation that no longer matter. All of these tokens cost money and add latency without improving the response.
The Token Company, backed by Y Combinator, provides compression middleware that sits between your application and the LLM. Their models strip context bloat from prompts before they reach the model, reducing token count by 66% and cutting costs proportionally.
The Micro: An 18-Year-Old ML Researcher Solving Token Economics
Otso Veistera founded The Token Company at age 18. He is described as an ML researcher and national physics champion, and he is building core optimization infrastructure for LLMs with a team of two in San Francisco.
The compression models (bear-1 and bear-1.1) use machine learning to identify and remove redundant content from prompts while preserving the information the LLM actually needs to generate a good response. The pricing is $0.05 per million compressed tokens, which is a fraction of the cost of the tokens you save.
The most interesting claim is that compression does not just maintain output quality. It improves it. In a blind LLM arena case study, compressed requests increased user preference and lifted purchase volume by 5%. This makes intuitive sense: removing noise from a prompt makes it easier for the model to focus on the relevant content, just like editing makes prose better by cutting unnecessary words.
The performance numbers are compelling: 66% token reduction, 3x cost savings, and up to 37% faster response times. The latency improvement comes from the model processing fewer tokens, which reduces both time-to-first-token and total generation time.
The integration is designed to be a drop-in middleware. No prompt engineering changes. No application rewrites. Just route your LLM calls through The Token Company’s API first.
Competitors include various prompt optimization techniques (manual prompt engineering, few-shot learning optimization) and emerging context window management tools. But a dedicated compression middleware with measurable quality improvements is a distinct product category.
The Verdict
The Token Company is solving a problem that every LLM application has and most teams ignore. Token bloat is universal, and most teams handle it by either accepting the cost or manually optimizing prompts.
At 30 days: how many LLM API calls are being routed through The Token Company’s compression, and what is the average compression ratio across different use cases?
At 60 days: does the quality improvement hold up across different LLM providers and use cases? Compression that works for chat might not work for code generation.
At 90 days: are large enterprises integrating compression into their LLM infrastructure as a standard layer?
I think this is a clever product. The economics are straightforward: if compression saves 3x on LLM costs and the compression itself costs a fraction of a cent per million tokens, the ROI is immediate. The quality improvement claim, if validated broadly, turns this from a cost optimization tool into a performance enhancement tool. That is a much stronger selling proposition.