URL indexes for 2012 - 2014

21 views
Skip to first unread message

Sebastian Nagel

unread,
Oct 13, 2016, 7:14:34 AM10/13/16
to common...@googlegroups.com
Hi everyone,

it was a wish often expressed on this list to have URL indexes available also for older crawl archives.

We've started to generate indexes for the crawl archives of 2013 and 2014:
- indexes for the two 2013 crawls are ready
- also some indexes of the monthly crawls in 2014
- the remaining 2014 indexes will be available early in November

As usual you can access the URL index on
http://index.commoncrawl.org/
or get them on AWS S3 with the prefix
s3://commoncrawl/cc-index/collections/

In addition, the old URL index server for the 2012 crawl archives is up again. For the next days
it's temporarily reachable under
http://ec2-54-221-249-42.compute-1.amazonaws.com/
But we'll move it again to
http://urlsearch.commoncrawl.org/
If in doubt, please, try both URLs for the next time.

Best,
Sebastian
Reply all
Reply to author
Forward
0 new messages