Groups
Sign in
Groups
Common Crawl
Conversations
About
Send feedback
Help
URL Search Tool is down
42 views
Skip to first unread message
Maximilian Böhm
unread,
Aug 10, 2016, 5:54:34 AM
8/10/16
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Common Crawl
Hi there,
I am currently dealing with the issue to recover a page for a friend which needs it for her bachelor thesis.
Other sources (Google, Waybackmachine) do not have it in their index (anymore).
Anyhow, I found the post
http://commoncrawl.org/2013/03/url-search-tool/
which refers to
http://urlsearch.commoncrawl.org/
. But I only get a "502 - Bad Gateway".
Is this on purpose? Are there plans to bring it back? Might be helpful for others if there would be an update to this blog post.
Thank you!
Best regards
Max Böhm
PS.: I've have worked around and downloaded all index files. But still does not find the required pages. I would be happy for any support. There is a thread on SO:
http://stackoverflow.com/questions/38869741/commoncrawl-how-to-find-a-specific-web-page
Sebastian Nagel
unread,
Aug 10, 2016, 6:09:00 AM
8/10/16
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to common...@googlegroups.com
Hi Maximilian,
>
http://urlsearch.commoncrawl.org/
Yes, this service is down and it is unclear whether we are able (with reasonable effort)
to bring it back again.
There is a new URL index to the Common Crawl data released since end of 2014
available at
http://index.commoncrawl.org/
> PS.: I've have worked around and downloaded all index files. But still does not find the required
pages.
> I would be happy for any support. There is a thread on
> SO:
http://stackoverflow.com/questions/38869741/commoncrawl-how-to-find-a-specific-web-page
I'll answer on SO later, here the very short answer: you can query
index.commoncrawl.org
, e.g.,
http://index.commoncrawl.org/CC-MAIN-2016-30-index?url=thesun.co.uk&matchType=domain
(there is a link where the query syntax is explained)
Best,
Sebastian
> --
> You received this message because you are subscribed to the Google Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
>
common-crawl...@googlegroups.com
<mailto:
common-crawl...@googlegroups.com
>.
> To post to this group, send email to
common...@googlegroups.com
> <mailto:
common...@googlegroups.com
>.
> Visit this group at
https://groups.google.com/group/common-crawl
.
> For more options, visit
https://groups.google.com/d/optout
.
Reply all
Reply to author
Forward
0 new messages