CSC326 Project

75 views
Skip to first unread message

Nazanin

unread,
Nov 30, 2012, 6:48:25 PM11/30/12
to csc-32...@googlegroups.com
We are trying to test our search engine with multiple pages such as wikipedia and we get HTTP 403/400 error codes and the code is not capable of indexing the page as it can't open it. Should we be able to handle HTTP errors?

Thanks!

Wesley May

unread,
Nov 30, 2012, 7:12:13 PM11/30/12
to csc-32...@googlegroups.com
If a page that the crawler tries to access returns an HTTP error, I think it's fine to just ignore that link. if you can't access the page it won't be of any use anyways.

Nazanin

unread,
Dec 2, 2012, 3:35:26 PM12/2/12
to csc-32...@googlegroups.com
Thanks! That's right, the problem occurs when crawler tries to open up the page, I was actually able to hit the page on the browser.
Reply all
Reply to author
Forward
0 new messages