Anjali Arora
unread,May 5, 2011, 11:56:55 AM5/5/11Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to scrapy...@googlegroups.com
Hi,
From a source page, I am extracting the following:
item['nlabel'] = site.select('div[2]/div/table/tbody/tr/td/text()').extract()
This works fine except where there are child tags within the td tag; so if there is a <strong> tag within the <td>, the spider skips whatever is within <strong>some text here</strong> leading in extraction of incomplete information. How do I get around this? XPath examples will be great.
Thanks.
-AA