Hi all,
Long-time lurker, first post. I wanted to share two free Windows desktop tools I built on top of Common Crawl's Web Graph data, and the processing approach I used to get there on a single consumer machine.
The tools:
Tom's AI Rank Checker (free) — checks any domain's position in the Common Crawl web graph using Harmonic Centrality scores. Covers 120,025,754 domains from the cc-main-2026-jan-feb-mar crawl. Includes an AI Visibility Score, tier badges (Elite through Long Tail), batch checking, and a PHP-based online checker at tomdahne.com. The idea came from spotting CCBot in my server logs at 1am and wondering what it was actually doing.
Tom's Link Authority (free base + optional data shards) — offline backlink checker and link gap analyser built from the same crawl. Returns up to 100,000 referring domains per lookup, with authority scores for each linking domain. Covers 4.4 billion link connections across 27 SQLite shard databases keyed by first letter of domain.
The processing approach:
Rather than Spark or AWS EMR, I processed the WAT files locally on a Windows machine (AMD Ryzen 5 7600, 32GB RAM) using a custom C++ processor I built called cc_rank_processor. The key insight that unlocked practical performance was sequential single-database processing with no PRIMARY KEY during edge inserts — avoiding B-tree write saturation that was causing 58+ hour ETAs in earlier versions.
Stats from the cc-main-2026-jan-feb-mar processing run:
The shards are hosted on Cloudflare R2 with 72-hour expiring signed URLs for distribution. Machine-locked licence keys prevent shard sharing.
Why I built it this way:
My philosophy across all my tools is zero external dependencies, single portable EXE, fully offline, one-time pricing or free. The CC web graph was a perfect fit — it's the same dataset that feeds AI model training, which means domain authority in CC is increasingly relevant as a proxy for AI search visibility, not just traditional SEO.
Both tools are free to download at tomdahne.com. Happy to answer questions about the processing pipeline, the SQLite shard architecture, or the Harmonic Centrality implementation.
Tom