Perplexing: Xpath works in shell, not in spider

7 views

Skip to first unread message

Tech Master

unread,

Jul 14, 2011, 12:46:45 PM7/14/11

to scrapy...@googlegroups.com

Hi All,

My start url is 'http://www.san-j.com/product_list.asp?id=1'

When I use the Xpath '/html/body/table/tr/td/table/tr/td/table/tr[3]/td[2]/table/tr/td/table/tr[5]/td/text()').extract() in the shell, I get the exact text I need namely [u'Water, Soybeans, Salt, Alcohol (to preserve freshness)']

However using the same Xpath (except for splitting part of it into sites=hxs.select(...) ), I draw a complete blank. I'd love to know what is going on here.

2. On a side note, I'd also like to be able to capture '/html/body/table/tr/td/table/tr/td/table/tr[3]/td[2]/table/tr/td/table/tr[7]/td/text()').extract()' within the same item that I captured in #1 above. You will notice that only the tr[5] differs; so essentially I need to capture the text within tr[5] as well as tr[7] in the same item. How would I do that?

Thanks.

-tm

David C.

unread,

Jul 15, 2011, 1:44:48 AM7/15/11

to scrapy...@googlegroups.com

1. XPath expressions can also be created using relative location paths:

$ scrapy shell http://www.san-j.com/product_info.asp?id=1 --nolog
hxs.select("//b[contains(.,'Ingredients: ')]/parent::td/text()").extract()

2011/7/14 Tech Master <tech.m...@gmail.com>:

> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to
> scrapy-users...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/scrapy-users?hl=en.
>

Reply all

Reply to author

Forward

0 new messages