Scrapy giving 404 for valid URL

1,711 views
Skip to first unread message

Tapasweni Pathak

unread,
Aug 6, 2014, 7:29:12 AM8/6/14
to scrapy...@googlegroups.com
Hi,

I am scraping Zara. As soon as I start my crawler it gives me 404 for very first link.  

Here is my items.py, spider.py, settings.py.

Why is scrapy giving me 404?


Thanks,


Rolando Espinoza La Fuente

unread,
Aug 6, 2014, 6:01:25 PM8/6/14
to scrapy...@googlegroups.com
Can you see what's the status in the browser? I've seen sites that replies with a status 404 and show the content, perhaps is an strategy to remove products from search engines. However, you can also use "scrapy shell URL" to see if you get the 404 and inspect response body.

If you are sure the 404 responses have the content you want, you can use the attribute handle_httpstatus_list to not ignore those responses. See

Regards,
Rolando


--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Nicolás Alejandro Ramírez Quiros

unread,
Aug 18, 2014, 2:55:08 PM8/18/14
to scrapy...@googlegroups.com
Looks related to your user agent, try changing it.
Reply all
Reply to author
Forward
0 new messages