How to get a list of domain names, 200 million of them, plain text, domain only, no other data

33 views
Skip to first unread message

Bruno A. H. Vincent (Webmaster & Tech)

unread,
Aug 30, 2025, 10:40:17 PM (7 days ago) Aug 30
to Common Crawl
I've been fighting with AI for 3 days, like 100's of attempts

Maybe a human can help me?

I'm running windows 11,  i3 12100, 32 gigs of RAM good internet.

This is what I'm trying to do:

Fetch domains from common crawl, then the have them in a list like this:

a.com
b.com

I want only western countries extensions like .com, .org, .net

I have gitbash installed, can use Powershell, python,  curl, whatever is required.

How can i do this, at a loss!


Greg Lindahl

unread,
Aug 30, 2025, 11:57:58 PM (6 days ago) Aug 30
to common...@googlegroups.com
Bruno,

Our webgraph has that list as a separate file. I'd agree with you that this is unfortunately hard to discover!

-- greg



--
You received this message because you are subscribed to the Google Groups "Common Crawl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/common-crawl/320d0c9e-c179-4640-952e-23e5570232fen%40googlegroups.com.

Bruno A. H. Vincent (Webmaster & Tech)

unread,
Aug 31, 2025, 9:14:59 PM (6 days ago) Aug 31
to Common Crawl
Thanks, what URL is this on?


" Our webgraph has that list as a separate file. I'd agree with you that this is unfortunately hard to discover!"

is it a text file?

Greg Lindahl

unread,
Aug 31, 2025, 9:18:33 PM (6 days ago) Aug 31
to common...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages