Low speed of xpath select operation

Showing 1-2 of 2 messages
Low speed of xpath select operation Grigoriy Petukhov 11/14/11 1:12 AM
Hi guys,

I am learning scrapy and I've faced with strange problem.

Here is code of simple spider to reproduce my issue. I've tried to
implement only code which works with HtmlXPathSelector
but I do not understand how to build Response object manually. So the
code of spider is:


    # -*- coding: utf-8 -*-
    import time

    from scrapy.spider import BaseSpider
    from scrapy.selector import HtmlXPathSelector

    class GigantclipsSpider(BaseSpider):
        name = 'test'
        start_urls = ['http://tubesexclips.com/']

        def parse(self, response):
            hxs = HtmlXPathSelector(response)

            start = time.time()
            hxs.select('//div[@class="thumb"]').extract()[0]
            print '%.2f' % (time.time() - start)

            start = time.time()
            hxs.select('//div[@class="thumb"]/a').extract()[0]
            print '%.2f' % (time.time() - start)

If I run `scrapy crawl test` I get:

0.07
14.22

14 seconds for simple xpath query! Please point me that I am doing
wrong. I've tested this query with lxml and it has taken less a second
as expected.
Re: Low speed of xpath select operation Rolando Espinoza La fuente 11/14/11 5:56 AM
> 14 seconds for simple xpath query! Please point me that I am doing
> wrong. I've tested this query with lxml and it has taken less a second
> as expected.

There are 547 nodes for that simple xpath query.  The HXS wrapper
is slower than raw api calls when there are many nodes selected.

If you wan to select just first element, but seems unlikely, you can use
this xpath: //div[@class="thumb"]/a[1]

If you want to extract links in order to crawl them, use the link extractor
which is faster in this case:

$ scrapy shell url
...
>>> from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
>>> lx = SgmlLinkExtractor(restrict_xpaths='//div[@class="thumb"]')
>>> ret = lx.extract_links(response)

Regards,

~Rolando

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.
>
>