December 2017 Crawl Archive Now Available

12 views
Skip to first unread message

Sebastian Nagel

unread,
Dec 22, 2017, 12:10:28 PM12/22/17
to common...@googlegroups.com
Hi all,

the December 2017 crawl archive is now available. The crawl was run from Dec 10 to Dec 19, 2017
and covers 2.9 billion web pages or 244 TiB of uncompressed content. More details about the crawl
and information how to access and use the data can be found on our blog [1].

You'll find statistics and metrics about the current crawls and previous crawls on [2].

The URL index of the December crawl is available at [3]. Please note that a fix of
an issue [4] affecting the index API has been deployed together with the December index.


Best and happy holidays,
Sebastian


[1] http://commoncrawl.org/2017/12/december-2017-crawl-archive-now-available/
[2] https://commoncrawl.github.io/cc-crawl-statistics/
[3] http://index.commoncrawl.org/CC-MAIN-2017-51/
[4] https://github.com/ikreymer/pywb/issues/249
Reply all
Reply to author
Forward
0 new messages