Caveman: Cut Claude API Token Usage by 75%

The tagline says it all: “Why use so many token when few do trick?”

Caveman is a Claude Code skill plugin from JuliusBrussee that claims to cut roughly 75% of Claude’s output tokens without losing technical accuracy, and it’s sitting at 31.6k GitHub stars right now. That’s not a typo. For a prompt-engineering utility that does exactly one job, that number is genuinely hard to argue with.

Here’s the short version of what it does: you install it in one line, you pick a grunt level, and your AI coding assistant starts talking like a cavewoman who learned to code. Fewer tokens out means less API spend and faster responses. If you’re burning through Claude API credits on verbose explanations nobody asked for, Caveman is aimed at you.

The plugin slots into Claude Code, Cursor, Windsurf, Copilot, and a handful of other tools. One-line install. That’s the pitch.

Grunts on a Spectrum

The “four grunt levels” thing is where Caveman gets interesting as a design choice. Most token-reduction tools are binary: you either apply some compression or you don’t. Caveman gives you a dial. According to the repo, the grunt levels range from slightly terse to something approaching full caveman-mode, where the AI is basically stripping prose down to the functional minimum.

This matters more than it sounds. There are workflows where you want a complete explanation with context, and there are workflows where you just want the function signature and the error fixed. The fact that Caveman doesn’t force you to choose one mode permanently is a real usability decision, not a gimmick.

It also ships with terse commit generation and one-line PR reviews built in. Commit messages are one of those things that AI assistants tend to absolutely over-engineer, producing three-paragraph summaries for a two-line fix. Trimming that down is a legitimate quality-of-life improvement for anyone doing code review at speed.

Input compression is also in the feature set, which means it’s not just squashing output. It’s compressing what goes in too. That’s a broader token-efficiency story than just “make the AI shut up faster.”

The Actual Technical Argument

Let me be specific about why 75% token reduction is worth taking seriously as a claim.

Large language model APIs like Claude charge by token, both input and output. For developers using AI-assisted coding in any serious volume, output tokens from a verbose assistant can add up fast, especially when the model insists on explaining its reasoning in complete paragraphs before giving you the three lines of code you actually needed. If Caveman’s number holds, you’re looking at potentially cutting a meaningful chunk of API costs for teams that run these tools heavily.

The repo has 1.5k forks and 140 commits as of today, April 15, 2026. That’s a project with real contributor activity, not a demo someone posted and forgot. The fact that it’s a SKILL.md-based implementation means the mechanism is a structured prompt system that the LLM treats as behavioral instructions, not a wrapper that intercepts and rewrites API calls. It’s elegant in a weird, low-tech way. You’re not patching anything. You’re just handing the model a personality spec.

The most recent commit, which synced SKILL.md copies and auto-activation rules, shows the project is still actively maintained. That’s a good sign.

The Broader Context for This Kind of Tool

There’s a whole class of tools emerging right now that exist specifically because foundation model defaults are too chatty. The models are trained to be helpful, and “helpful” in RLHF-land often means thorough, which means long. For developers, thoroughness is frequently the enemy of flow state. You want the answer, not the explanation of why the answer is correct.

Caveman is part of this broader correction. Other tools in this space approach it by building custom system prompts, building middleware layers, or just tweaking temperature settings and hoping for the best. Caveman’s approach is more opinionated: it gives you a defined persona with defined behavior tiers, so you know what you’re getting when you set grunt level 3.

Whether that approach generalizes beyond individual developer workflows into team settings is an open question. You’d need some way to standardize the SKILL.md install across a team’s setups, which is possible but adds coordination overhead.

Open Source, Stars, and What They Mean

31.6k stars on a developer tool is a real signal. GitHub stars are noisy as a metric, and I say that as someone who has watched plenty of overnight-star-farm repos collapse into abandonment. But combined with 1.5k forks, 140 commits, and 10 tagged releases, the picture here looks more like sustained organic interest than a flash-in-the-pan viral moment.

The product got solid traction on launch day when it surfaced as one of the day’s top launches. That’s consistent with the GitHub trajectory.

What drives this kind of star count for a utility plugin, I think, is that the problem it solves is one every developer using Claude Code has felt personally. The AI explains too much. You’ve thought “just give me the code” at least once. Caveman is a direct answer to that specific frustration, and when a tool names a pain point clearly, developers share it.

What I’d Actually Want to Know

The 75% token reduction claim is stated plainly but I haven’t seen a rigorous benchmark attached to it in the source material. That’s the number I’d want to stress-test before committing to it in a cost analysis. Is that 75% on code-heavy tasks specifically? On explanation-heavy tasks? Across a mix? The difference matters a lot depending on how you use your coding assistant.

The grunt level behavior is also something I’d want to see documented more explicitly. What does level 1 actually suppress versus level 4? Are there categories of output that always get cut regardless of level, like boilerplate disclaimers? I’d want a concrete comparison table.

I’d also want to understand the PR review feature better. “One-line PR reviews” is a bold claim.

A one-line PR review that actually captures the right signal is genuinely useful. A one-line review that misses the thing that matters is worse than no review at all. That feature lives or dies on how well the compression preserves semantic priority, not just length.

The Name, Though

Caveman is doing real work as a name here. It’s funny, it’s memorable, and it sets expectations correctly. You install this thing knowing you’re going to get grunts, not essays. That’s good product naming. It’s self-aware in a way that a lot of developer tools aren’t. Most devtools name themselves after abstract nouns or compound words that mean nothing. Caveman tells you the whole story in one word and makes you laugh.

It’s also, I’d argue, a smart positioning choice for an open-source tool that needs word-of-mouth to spread. You tell a colleague “I installed Caveman on my Claude Code setup and it’s great” and they immediately ask what it is and why it’s called that. You’ve already got their attention.

JuliusBrussee shipped something with a clear opinion about what LLM output should look like in a coding context, and that clarity is visible in both the feature set and the branding. That’s not common.

For anyone currently watching their Claude API bill climb and wishing their coding assistant would just answer the question, this is worth ten minutes to install and try. The one-line setup means you’re not committing to anything.

Caveman: Cut Claude API Token Usage by 75%

More on this

The HUGE Brief