the crawl archives of June/July 2022 are now available. The data was
crawled June 24 – July 7 and contains 3.1 billion web pages or 370 TiB
of uncompressed content. Page captures are from 44 million hosts or 35
million registered domains and include 1.4 billion new URLs, not
visited in any of our prior crawls.
As usual, more details about the crawl and information how to access
and use the data can be found on the Common Crawl blog .