July 2024 Crawl Release now available, including some changes

53 views
Skip to first unread message

Thom Vaughan

unread,
Jul 28, 2024, 5:29:16 PMJul 28
to Common Crawl
Greetings all,

We're happy to announce that the July 2024 crawl (CC-MAIN-2024-30) is now available, containing 2.5 billion web pages, or 360 TiB of uncompressed content.  Two new WARC headers were introduced in this crawl containing information relating to the HTTP protocol; please see  the crawl description and our announcement post for more information.

Please get in touch if you have any questions or comments. Enjoy!

TV
Reply all
Reply to author
Forward
0 new messages