Question on the host-level web graph

33 views
Skip to first unread message

Akash

unread,
Jul 19, 2017, 8:48:06 AM7/19/17
to Common Crawl
Hi Sebastian,

Very impressed by the host-level web graph downloaded from http://commoncrawl.org/2017/05/hostgraph-2017-feb-mar-apr-crawls/ (especially the Harmonic Centrality ranks)

I understand this data is taken from the crawls of three months (Feb, Mar, Apr). Is this data released every three months? i.e. will we have a new update for the data crawled in May, June & July? If so, any idea when this will be released?

Thanks,
Akash

Sebastian Nagel

unread,
Jul 19, 2017, 9:44:08 AM7/19/17
to common...@googlegroups.com
Hi Akash,

it's not 100% clear how we release the web graph data sets in the future.
But for now, the plan is to release it as suggested. The July crawl is
in preparation and should be ready July 31. After that we'll build the
web graph based on the May/June/July crawls. Eventually, we add a
domain-level graph and rankings. The release should be approx. in the
second week of August.

Best,
Sebastian
> --
> You received this message because you are subscribed to the Google Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> common-crawl...@googlegroups.com <mailto:common-crawl...@googlegroups.com>.
> To post to this group, send email to common...@googlegroups.com
> <mailto:common...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/common-crawl.
> For more options, visit https://groups.google.com/d/optout.

Akash

unread,
Jul 19, 2017, 10:39:44 AM7/19/17
to Common Crawl
That sounds great Sebastian. Thank for letting me know.

Akash


On Wednesday, July 19, 2017 at 7:14:08 PM UTC+5:30, Sebastian Nagel wrote:
Hi Akash,

it's not 100% clear how we release the web graph data sets in the future.
But for now, the plan is to release it as suggested.  The July crawl is
in preparation and should be ready July 31.  After that we'll build the
web graph based on the May/June/July crawls.  Eventually, we add a
domain-level graph and rankings.  The release should be approx. in the
second week of August.

Best,
Sebastian

On 07/19/2017 02:48 PM, Akash wrote:
> Hi Sebastian,
>
> Very impressed by the host-level web graph downloaded
> from http://commoncrawl.org/2017/05/hostgraph-2017-feb-mar-apr-crawls/ (especially the Harmonic
> Centrality ranks)
>
> I understand this data is taken from the crawls of three months (Feb, Mar, Apr). Is this data
> released every three months? i.e. will we have a new update for the data crawled in May, June &
> July? If so, any idea when this will be released?
>
> Thanks,
> Akash
>
> --
> You received this message because you are subscribed to the Google Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
Reply all
Reply to author
Forward
0 new messages