504 error at index

4 views
Skip to first unread message

Ayrton Vu-Guerin

unread,
Jun 4, 2026, 10:08:06 AM (3 days ago) Jun 4
to Common Crawl

Tom Morris

unread,
Jun 4, 2026, 10:11:59 AM (3 days ago) Jun 4
to common...@googlegroups.com
On Thu, Jun 4, 2026 at 10:08 AM Ayrton Vu-Guerin <kiwia...@gmail.com> wrote:
If you look at https://index.commoncrawl.org/, you'll see that it says 

Please do not overload the URL index server. For bulk downloads (e.g. all records of the entire .com top-level domain), see the download instructions. The Columnar Index is a better fit for bulk filtering and aggregation.

The index server is intended for casual use shared among the entire community. If you want to do a search that returns such a large result set, you should download the index data and do the search locally. You can find those files here: https://data.commoncrawl.org/cc-index/collections/index.html

You may also find the columnar index useful: https://data.commoncrawl.org/cc-index/table/cc-main/index.html

Tom 
Reply all
Reply to author
Forward
0 new messages