scrapy inline requests

188 views
Skip to first unread message

Rolando Espinoza La Fuente

unread,
Feb 3, 2012, 4:26:53 PM2/3/12
to scrapy...@googlegroups.com
I have released an experimental decorator that allows to receive the
response of a request within a spider callback.
The code with an example spider is here:
https://github.com/darkrho/scrapy-inline-requests

Mainly this address the issue of having to build a item with
information from multiple pages. The common approach
is to perform the extra requests and pass the item through the meta
attribute. For example:

def parse_item(self, response):
# load item data ...
item = self._load_item(response)
next_url = urljoin(response.url, "/info")
return Request(next_url, meta={"item": "item"},
callback=self.parse_item_info)

def parse_item_info(self, response):
item = response.meta["item"]
# load more data
...

But it becomes a mess when you require to perform many requests in strict order.
Using the `inline_requests` decorator it would be like this:

@inline_requests
def parse_item(self, response):
# load item data
item = self._load_item(response)

# perform next request
next_url = urljoin(response.url, "/info")
response = yield Request(next_url)

# load more info and finally yield the item
yield item

The example project provides a real spider which illustrates the
decorator usage:
https://github.com/darkrho/scrapy-inline-requests/blob/master/example/stackoverflow/spider.py#L29

The decorator hasn't been fully tested under all use cases and it's
highly experimental, but I hope
you can help me to improve it.

To start playing with the decorator you can grab the code from github
or use `pip install scrapy-inline-requests`.

Regards,

~Rolando

Martin Loy

unread,
Feb 6, 2012, 9:11:22 AM2/6/12
to scrapy...@googlegroups.com
Nice Work!!!

cheers!


--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To post to this group, send email to scrapy...@googlegroups.com.
To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.




--
Nunca hubo un amigo que hiciese un favor a un enano, ni un enemigo que le hiciese un mal, que no se viese recompensado por entero.

vitsin

unread,
Feb 8, 2012, 10:58:19 AM2/8/12
to scrapy-users
great feature! thank you.
Is there any plans to add such decorator into upcoming release?
--vs


On Feb 3, 4:26 pm, Rolando Espinoza La Fuente <dark...@gmail.com>
wrote:
> decorator usage:https://github.com/darkrho/scrapy-inline-requests/blob/master/example...

Pablo Hoffman

unread,
Feb 19, 2012, 4:08:16 AM2/19/12
to scrapy...@googlegroups.com
Hi Rolando,

That is a very nice decorator indeed. An idea very well borrowed from
Twisted's inlineCallbacks :)

Even though its use of partials prevents it from using it with the
persistent scheduler, its syntax is convenient enough for putting aside
that restriction. I look forward to introducing this decorator in the
next release. Could you add some tests and doc?

Thanks,
Pablo.

Владислав Полухин

unread,
Jan 15, 2013, 10:07:32 PM1/15/13
to scrapy...@googlegroups.com
Pablo, what about introducing?

воскресенье, 19 февраля 2012 г., 17:08:16 UTC+8 пользователь Pablo Hoffman написал:

Pablo Hoffman

unread,
Jan 23, 2013, 1:28:42 AM1/23/13
to scrapy...@googlegroups.com
I'm happy to review and merge a pull request that adds the inline_requests decorator that Rolando introduced a year ago. It can be based on Rolando's code, or a complete rewrite. Tests are a must, doc is optional (but a good plus).


--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/scrapy-users/-/OZTTCNjBj64J.
Reply all
Reply to author
Forward
0 new messages