How to get a list of domain names, 200 million of them, plain text, domain only, no other data

216 views
Skip to first unread message

Bruno A. H. Vincent (Webmaster & Tech)

unread,
Aug 30, 2025, 10:40:17 PM8/30/25
to Common Crawl
I've been fighting with AI for 3 days, like 100's of attempts

Maybe a human can help me?

I'm running windows 11,  i3 12100, 32 gigs of RAM good internet.

This is what I'm trying to do:

Fetch domains from common crawl, then the have them in a list like this:

a.com
b.com

I want only western countries extensions like .com, .org, .net

I have gitbash installed, can use Powershell, python,  curl, whatever is required.

How can i do this, at a loss!


Greg Lindahl

unread,
Aug 30, 2025, 11:57:58 PM8/30/25
to common...@googlegroups.com
Bruno,

Our webgraph has that list as a separate file. I'd agree with you that this is unfortunately hard to discover!

-- greg



--
You received this message because you are subscribed to the Google Groups "Common Crawl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/common-crawl/320d0c9e-c179-4640-952e-23e5570232fen%40googlegroups.com.

Bruno A. H. Vincent (Webmaster & Tech)

unread,
Aug 31, 2025, 9:14:59 PM8/31/25
to Common Crawl
Thanks, what URL is this on?


" Our webgraph has that list as a separate file. I'd agree with you that this is unfortunately hard to discover!"

is it a text file?

Greg Lindahl

unread,
Aug 31, 2025, 9:18:33 PM8/31/25
to common...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages