Common Crawl for SEOs

20 views
Skip to first unread message

Wcluddy

unread,
Jun 18, 2022, 12:12:24 AM (9 days ago) Jun 18
to Common Crawl
Hey Tech Buddies, I can completely sense the power of the Common Crawl Database for the SEO Community. But as a beginner I'm bit confused how I can start digging this data so I can build something useful for SEOs!

To be precise let's assume I want to build simple backlinks checker. What would be my first step in the direction to build the tool which can analyse the data and find the dots between the domains.

I'm looking for some beginners resources to get started with the data!

Looking forward to a in-depth answer!

Thanks!

Netanel Baruch

unread,
Jun 18, 2022, 5:27:35 AM (9 days ago) Jun 18
to common...@googlegroups.com
Hey! 
Can I get some more information about it? It can be very useful for us 


--
You received this message because you are subscribed to the Google Groups "Common Crawl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/common-crawl/69c008c0-cdaf-4377-83f0-f2aa306ca9f6n%40googlegroups.com.

Sebastian Nagel

unread,
Jun 23, 2022, 2:38:48 AM (4 days ago) Jun 23
to common...@googlegroups.com
Hi,

> To be precise let's assume I want to build simple backlinks checker.

> find the dots between the domains.

The easiest way would be to use the host/domain-level webgraphs?
See [1].

Note, that these webgraphs do not contain information
- about the number of links between hosts or domains or
- on which page a link was found


If you want to extract page-level links, you'd probably start
using the WAT files. But be aware that there are many page-level
links, 500 billion or more in a single monthly crawl.


You'll find more on this topic if you browse the archives of this
discussion group. Some years ago somebody built a backlink index
but the project seems now abandoned. It's referenced in [2].


Best,
Sebastian

[1]
https://commoncrawl.org/2022/03/host-and-domain-level-web-graphs-oct-nov-jan-2021-2022/
[2]
https://www.reddit.com/r/bigseo/comments/dia7hn/any_good_ahrefs_alternatives_for_link_analysis/
Reply all
Reply to author
Forward
0 new messages