Groups
Sign in
Groups
Common Crawl
Conversations
About
Send feedback
Help
Release of URL index in columnar format + February 2018 crawl
24 views
Skip to first unread message
Sebastian Nagel
unread,
Mar 2, 2018, 12:15:56 PM
3/2/18
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to common...@googlegroups.com
Dear Common Crawl users,
today, we're glad to announce two releases of new data:
1. the URL index in columnar format is ready for production
and contains now the latest 5 monthly crawl archives
(Oct 2017 - Feb 2018) More information can be found on
our blog at
http://commoncrawl.org/2018/03/index-to-warc-files-and-urls-in-columnar-format/
2. the February 2018 crawl archives are now available with
3.4 billion web pages or 270+ TiB uncompressed content.
For further details and links to access the data please
check
http://commoncrawl.org/2018/03/february-2018-crawl-archive-now-available/
Best,
Sebastian
Reply all
Reply to author
Forward
0 new messages