The SEO Spider That Lives on Your Machine, Not Someone Else's Server

The Macro: The SEO Tool Market Has a Subscription Problem

Somewhere around 2019, the SEO software category made a quiet collective decision: everything should be SaaS, everything should be cloud-hosted, and everyone should pay forever. Ahrefs, Semrush, Screaming Frog (partially), Sitebulb. They all follow roughly the same model. You get a dashboard, a crawl quota, and a monthly invoice.

The market is big enough to support all of it. According to Grand View Research, the martech sector was valued at $551.96 billion in 2025 and is projected to grow at a CAGR of 20.1% through 2033. Some portion of that is SEO tooling, and that portion has been extraordinarily loyal to the subscription model. Which makes sense from a business perspective. It makes less sense if you’re a developer, a freelancer, or an agency that crawls fifty sites a month and doesn’t want to hand that data to a third party.

The local-first angle is underexplored. Most tools in this category treat your crawl data as their asset by default. It sits on their servers, feeds their benchmarks, and occasionally shows up in aggregate reports they sell back to you. If you’d like to understand what’s actually hitting your site and why, you usually have to pay someone else to tell you.

Screaming Frog is the closest comparison most people reach for. It’s genuinely good and genuinely expensive for teams. Beyond that, the CLI space for SEO crawling is fragmented and mostly relies on stitching together open-source scripts that don’t talk to each other cleanly.

That’s the gap Crawler.sh is betting on.

The Micro: Sixteen Checks, Clean Markdown, Zero Cloud

Crawler.sh is a local-first web crawler that ships in two forms: a native desktop app and a CLI. Both run on your machine. Nothing is sent to an external server during a crawl. You point it at a domain and it works outward from there, staying within that domain, with configurable concurrency, depth limits, and request delays built in so you’re not accidentally DDoS-ing someone’s blog.

The SEO analysis layer runs 16 automated checks per page. Missing titles, duplicate meta descriptions, noindex directives, thin content, URLs that are too long. Standard auditing stuff, honestly, but it’s comprehensive enough to cover the issues that actually show up in client work. Results export as CSV or plain TXT, which means you can drop them into a Slack message or a client report without reformatting anything.

The content extraction piece is the more interesting product decision. Crawler.sh pulls the main article content from any page and converts it to clean Markdown automatically, including word count, author byline, and excerpt. That’s not a typical SEO feature. That’s a feature built for people who are feeding content into AI pipelines, doing competitive research, or archiving sites before a migration. It’s also where the AEO angle lives. As AI answer engines increasingly read structured content rather than rendered HTML, having clean Markdown output matters in a way it didn’t three years ago. We’ve covered how robots are increasingly the primary audience for your web content, and tools that acknowledge that shift feel more honest about the current moment.

It got solid traction on launch day, which tracks. The developer community has been vocal about wanting local tooling that doesn’t require a credit card.

Output formats include NDJSON, JSON arrays, W3C-compliant Sitemap XML, CSV, and TXT. NDJSON streaming during the crawl is a nice touch for anyone building a pipeline around this.

It’s currently free. Version 0.2.3 just added custom user-agent support.

The Verdict

I think this is a genuinely useful tool for a specific kind of user: developers who do SEO work, technical SEO practitioners who want local control, and anyone building content pipelines who needs clean structured output from arbitrary URLs. That’s not a small group.

The free pricing is both the most attractive thing about it and the most pressing question. Free tools in this category either grow into a paid tier, find a different monetization angle, or stall. At 30 days, I’d want to see what the paid roadmap looks like, because the current feature set is coherent enough that a pro tier makes obvious sense. Team collaboration, scheduled crawls, integrations. There’s a clear path.

At 60 days, the question is whether the Markdown extraction and AEO framing brings in a meaningfully different user than the typical SEO auditing crowd. Those two audiences want different things from a tool like this, and serving both cleanly is harder than it looks.

What I’d actually want to know: who’s building workflows around the NDJSON output, and whether the CLI is stable enough for production use in automated pipelines. That’s where the real defensibility would live.

For now, it’s a sharp, honest tool that doesn’t oversell itself. That’s rarer than it should be.

Also featured on HUGE: One Guy, One Prompt, and a $50K ARR Argument Against the Entire Stock Footage Industry · You Think You Know Your Web Traffic. You Don’t. · Nobody Likes Filling Out Forms. This Open-Source Project Wants You to Talk Instead.

The SEO Spider That Lives on Your Machine, Not Someone Else's Server

The Macro: The SEO Tool Market Has a Subscription Problem

The Micro: Sixteen Checks, Clean Markdown, Zero Cloud

The Verdict

More on this

The HUGE Brief