How to get all text including within child tags

1,846 views
Skip to first unread message

Anjali Arora

unread,
May 5, 2011, 11:56:55 AM5/5/11
to scrapy...@googlegroups.com
Hi,

From a source page, I am extracting the following:

item['nlabel'] = site.select('div[2]/div/table/tbody/tr/td/text()').extract()

This works fine except where there are child tags within the td tag; so if there is a <strong> tag within the <td>, the spider skips whatever is within <strong>some text here</strong> leading in extraction of incomplete information. How do I get around this? XPath examples will be great.

Thanks.
-AA

Rolando Espinoza La Fuente

unread,
May 5, 2011, 12:08:21 PM5/5/11
to scrapy...@googlegroups.com

Use //text()

Regards,

~Rolando

Anjali Arora

unread,
May 5, 2011, 12:35:26 PM5/5/11
to scrapy...@googlegroups.com
Thanks, that does it for me!

Best.
-AA


--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To post to this group, send email to scrapy...@googlegroups.com.
To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.


Reply all
Reply to author
Forward
0 new messages