The April 2026 crawl consists of 2.19 billion web pages (or 379.2 TiB of
uncompressed content). Captures are from 43.2 million hosts or 35.4 million registered domains and include 660.5 million new URLs, not visited
in any of our prior crawls.
Starting from this crawl, revisit records now use the WARC header
Content-Type: application/http;msgtype=response (previously
message/http), aligning with
iipc/warc-specifications#55 for consistency with other HTTP response records.
The
corresponding
Web Graph release consists of 269.0 million nodes and 9.4 billion edges at
the host level, and 124.6 million nodes and 4.8 billion edges at the
domain level.