Host- and domain-level web graphs and rankings

20 views
Skip to first unread message

Ed Coughlan

unread,
Apr 27, 2022, 4:58:16 AMApr 27
to Common Crawl
Hello all,

If I'm not mistaken, a flat file of all 90 million domain ranks is available for download every 4 months.

Is it possible to access this data in this format on a more regular cadence, say, monthly?

Thank you,

Ed


Sebastian Nagel

unread,
Apr 27, 2022, 9:52:24 AMApr 27
to common...@googlegroups.com
Hi Ed,

> If I'm not mistaken, a flat file of all 90 million domain ranks is
> available for download every 4 months.

Strictly speaking a new webgraph release is built on three main crawls,
so the release frequency of the graphs depends mostly on the release
frequency of the crawls.

> Is it possible to access this data in this format on a more regular
> cadence, say, monthly?

It's not about access, it's about constructing the graphs.

It does not make really sense to use build the graphs for
a single main crawl because less data would make the graphs
smaller (fewer domains) and less connected causing the rankings
to be less reliable, Of course, a rolling window approach could
be a possible solution.

However, for now there are no plans to construct the graphs and rankings
in shorter intervals.

Best,
Sebastian
> --
> You received this message because you are subscribed to the Google
> Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to common-crawl...@googlegroups.com
> <mailto:common-crawl...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/common-crawl/a1e6e8b1-241d-4e32-b6f4-ee4e7b955251n%40googlegroups.com
> <https://groups.google.com/d/msgid/common-crawl/a1e6e8b1-241d-4e32-b6f4-ee4e7b955251n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Ed Coughlan

unread,
Apr 27, 2022, 10:23:00 AMApr 27
to common...@googlegroups.com
Thank you Sebastian for the very comprehensive (as always) reply.

To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/common-crawl/8526cf79-ca6c-3064...@commoncrawl.org.
Reply all
Reply to author
Forward
0 new messages