class MySpider(BaseSpider):
name = 'xpto'
allowed_domains = ['website.com']
start_urls = [
'http://www.website.com/places.asp?id=100'
]
def parse(self, response):
sel = Selector(response)
places = sel.xpath('//p[@class="txt"]/table//td[@class="txtm"]/a/@href').extract()
for place in places:
print placedef parse(self, response): open('test.html', 'wb').write(response.body)can you post the link to the actual page?
Without more information, any suggestions would just be guessing. If you can't, I'd recommend loading the page in scrapy shell and trying to figure it out that way.
--
You received this message because you are subscribed to a topic in the Google Groups "scrapy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scrapy-users/rvq9fGDPRWI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and all its topics, send an email to scrapy-users+unsubscribe@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "scrapy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scrapy-users/rvq9fGDPRWI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scrapy-users+unsubscribe@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "scrapy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scrapy-users/rvq9fGDPRWI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scrapy-users+unsubscribe@googlegroups.com.
I do. I've attached an extremely simple spider that crawls those links. Hopefully the code will answer your questions, if not, feel free to ask any more you may have.
As for why that particular xpath works on the page and not in scrapy shell, my guess is that the data is loaded in with the webpage, so no AJAX. Then some js does something to the dom, there's a lot of ads on those pages, so I wouldn't be surprised.
--
You received this message because you are subscribed to a topic in the Google Groups "scrapy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scrapy-users/rvq9fGDPRWI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.
<portugal.py>