504 error at index

19 views
Skip to first unread message

Ayrton Vu-Guerin

unread,
Jun 4, 2026, 10:08:06 AMJun 4
to Common Crawl

Tom Morris

unread,
Jun 4, 2026, 10:11:59 AMJun 4
to common...@googlegroups.com
On Thu, Jun 4, 2026 at 10:08 AM Ayrton Vu-Guerin <kiwia...@gmail.com> wrote:
If you look at https://index.commoncrawl.org/, you'll see that it says 

Please do not overload the URL index server. For bulk downloads (e.g. all records of the entire .com top-level domain), see the download instructions. The Columnar Index is a better fit for bulk filtering and aggregation.

The index server is intended for casual use shared among the entire community. If you want to do a search that returns such a large result set, you should download the index data and do the search locally. You can find those files here: https://data.commoncrawl.org/cc-index/collections/index.html

You may also find the columnar index useful: https://data.commoncrawl.org/cc-index/table/cc-main/index.html

Tom 

Ayrton Vu-Guerin

unread,
Jun 14, 2026, 10:34:29 AMJun 14
to common...@googlegroups.com
Thank you for the response. This was for a university assignment, and I just wanted to get WARC files for the domain youtube.com. It has since been successful.

--
You received this message because you are subscribed to the Google Groups "Common Crawl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/common-crawl/CAE9vqEEOiWHioQo%3DQmd1%3DdM_8Hydj-mS2QxJnpE5Dmzf_H44cg%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages