Hello,
Hopefully someone out there with knowledge of the actual crawling process can help me out. I'm interested in "discovering" redirects (301 / 302 / etc). I'm not sure if this is possible using the Common Crawl index.
From what I can tell, only the final 200 results are saved / available, is this true? Is the actual 301 response saved anywhere?
Using the Common Crawl index, there are no results for the original domain called, only the redirect:
Here's a command line version using wget. Notice that the 301 redirect from
meraki.com to
meraki.cisco.com is reported during the fetch.
Is this type of metadata stored anywhere, or is it discarded?
HTTP request sent, awaiting response... 301 Moved Permanently
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’
[ <=> ] 29,580 --.-K/s in 0.02s
2015-10-19 01:17:57 (1.59 MB/s) - ‘index.html’ saved [29580]
Thanks for any advice!!
Soren