Errors searching for single URL

35 views
Skip to first unread message

Aline Bessa

unread,
Apr 10, 2015, 5:43:13 PM4/10/15
to common...@googlegroups.com
Hi all


but if I look for http://www.aids-sida.org/archivos/directorio_nacional/ags.html there, it returns a 0. Do you folks know why?

Thanks!

Dominik Stadler

unread,
Apr 11, 2015, 3:36:46 PM4/11/15
to common...@googlegroups.com
Hi,

As the crawl is a snapshot of the state at the time it is collected, the
actual contents at the urls can be different over time, e.g. I think here you see in
the crawl how the page looked like in Feb. 2012...

Dominik

Aline Bessa

unread,
Apr 11, 2015, 4:17:27 PM4/11/15
to common...@googlegroups.com
Sorry,  I may have expressed myself inaccurately.  What I mean is that there are pages that I can see in urlsearch.common.org but, when explicitly looking for them INSIDE common crawl, by using the Trivio's interface, I cannot fetch them. The API returns that it wasn't found in the common crawl's index (although it is listed in urlsearch.common.org -- i.e. it IS inside common crawl's index).

Suggestions on how to solve it? More generally: suggestions on looking for specific URLs inside Common Crawl in a way that always works?

Thanks!!

John Wiseman

unread,
Apr 11, 2015, 11:15:31 PM4/11/15
to common...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "Common Crawl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com.
To post to this group, send email to common...@googlegroups.com.
Visit this group at http://groups.google.com/group/common-crawl.
For more options, visit https://groups.google.com/d/optout.

ikre...@gmail.com

unread,
Apr 11, 2015, 11:24:20 PM4/11/15
to common...@googlegroups.com
The new index, available at: http://index.commoncrawl.org should have all the urls for the crawls listed.  There's not yet an index of the old data but that is planned.

Aline Bessa

unread,
Apr 20, 2015, 11:36:31 AM4/20/15
to common...@googlegroups.com
Thanks!!
Reply all
Reply to author
Forward
0 new messages