Response object for a 404 http error

25 views
Skip to first unread message

Hakim Benoudjit

unread,
Apr 15, 2014, 10:51:23 AM4/15/14
to scrapy...@googlegroups.com
hi guys,

I have a little issue with reponse object inside a request callback when the page returns a 404:
    - If the page exists (http code: 200) response is of type HtmlResponse.
    - If the page returns 404, response is of type instance which contain some attriubtes related to error messages, and in this latter case, status isnt an attriburte of the response object.

so I can know if the response status is 404, only if I verify response object class (HtmlResponse or instance ).

how do we know that a page returns 404 if response.status isnt available as an attribute of reponse object ?

Paul Tremberth

unread,
Apr 16, 2014, 5:32:01 PM4/16/14
to scrapy...@googlegroups.com
Hi Hakim,

I'm not sure how you get this "instance" with attributes related to errors. and you catching these through an errback?

You can get non-200 responses via HttpError middleware (enabled by default) and by defining an handle_httpstatus_list attribute to your spider

Example:

from scrapy.spider import Spider

class ErrorSpider(Spider):
    name = "testerror"
    allowed_domains = ["dmoz.org"]
    start_urls = [
        "http://www.dmoz.org/",
    ]
    handle_httpstatus_list = [404]

    def parse(self, response):
        self.log("type: %s; status %d" % (type(response), response.status))



Hakim Benoudjit

unread,
Apr 20, 2014, 11:53:47 AM4/20/14
to scrapy...@googlegroups.com
I have resolved it with errback= callback that handles http erors, in the Request() instance constructor.
Apparently in a Spider, inside a callback (other than parse) the response isnt defined when the http response is equal to 404.
Reply all
Reply to author
Forward
0 new messages