Release of URL index in columnar format + February 2018 crawl

24 views
Skip to first unread message

Sebastian Nagel

unread,
Mar 2, 2018, 12:15:56 PM3/2/18
to common...@googlegroups.com
Dear Common Crawl users,

today, we're glad to announce two releases of new data:

1. the URL index in columnar format is ready for production
and contains now the latest 5 monthly crawl archives
(Oct 2017 - Feb 2018) More information can be found on
our blog at
http://commoncrawl.org/2018/03/index-to-warc-files-and-urls-in-columnar-format/

2. the February 2018 crawl archives are now available with
3.4 billion web pages or 270+ TiB uncompressed content.
For further details and links to access the data please
check
http://commoncrawl.org/2018/03/february-2018-crawl-archive-now-available/

Best,
Sebastian
Reply all
Reply to author
Forward
0 new messages