June/July 2022 crawl archive now available

32 views
Skip to first unread message

Sebastian Nagel

unread,
Jul 13, 2022, 11:18:27 AMJul 13
to common...@googlegroups.com
Hi all,

the crawl archives of June/July 2022 are now available. The data was
crawled June 24 – July 7 and contains 3.1 billion web pages or 370 TiB
of uncompressed content. Page captures are from 44 million hosts or 35
million registered domains and include 1.4 billion new URLs, not
visited in any of our prior crawls.

As usual, more details about the crawl and information how to access
and use the data can be found on the Common Crawl blog [1].

Best,
Sebastian

[1]
https://commoncrawl.org/2022/07/june-july-2022-crawl-archive-now-available/
Reply all
Reply to author
Forward
0 new messages