Feed Exporter settings overriden by Scrapyd default item storage

971 views
Skip to first unread message

Raymond Ballew

unread,
May 5, 2012, 5:07:21 AM5/5/12
to scrapy-users
I want to use a Feed Exporter to send items to S3 storage. I can get
this working locally but when running through scrapyd, the settings
are overridden by the new item storage feature in scrapy 0.15. (I'm
using the latest dev version). My settings are:

FEED_URI = 's3://MYBUCKET/feeds/%(name)s/%(time)s.jl'
FEED_FORMAT = 'jsonlines'
FEED_STORE_EMPTY = True

I've tried putting

ITEMS_DIR =

in the file scrapyd settings file in /etc/scrapyd/conf.d/000-default
per http://doc.scrapy.org/en/latest/topics/scrapyd.html#items-dir but
I then just get an a mkdir permission denied error in scrapyd.log and
the spider doesn't run (tb at end).

Is there a way to either have my S3 exporting as an additional feed or
to have it instead of the default scrapyd feed?

I'd be grateful for any help that can be offered.

2012-05-05 09:41:56+0100 [-] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.6/scrapyd/poller.py", line 24, in
poll
returnValue(self.dq.put(self._message(msg, p)))
File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py",
line 1128, in put
self.waiting.pop(0).callback(obj)
File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py",
line 280, in callback
self._startRunCallbacks(result)
File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py",
line 354, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py",
line 371, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/usr/lib/pymodules/python2.6/scrapyd/launcher.py", line 43,
in _spawn_process
env = e.get_environment(msg, slot)
File "/usr/lib/pymodules/python2.6/scrapyd/environ.py", line 32, in
get_environment
env['SCRAPY_FEED_URI'] = self._get_file(message, self.items_dir,
'jl')
File "/usr/lib/pymodules/python2.6/scrapyd/environ.py", line 39, in
_get_file
os.makedirs(logsdir)
File "/usr/lib/python2.6/os.py", line 150, in makedirs
makedirs(head, mode)
File "/usr/lib/python2.6/os.py", line 157, in makedirs
mkdir(name, mode)
exceptions.OSError: [Errno 13] Permission denied: 'myproject'

Raymond Ballew

unread,
May 10, 2012, 6:28:02 PM5/10/12
to scrapy-users
Pablo - I saw that you made a commit to address this issue. Thanks for
that! I now get a different exception though, because SCRAPY_FEED_URI
is not set when items_dir is not set. Full traceback below:

2012-05-09 19:31:45+0000 [-] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/scrapyd/poller.py", line 24, in
poll
returnValue(self.dq.put(self._message(msg, p)))
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py",
line 1419, in put
self.waiting.pop(0).callback(obj)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py",
line 362, in callback
self._startRunCallbacks(result)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py",
line 458, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py",
line 545, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/usr/lib/pymodules/python2.7/scrapyd/launcher.py", line 46,
in _spawn_process
msg['_job'], env)
File "/usr/lib/pymodules/python2.7/scrapyd/launcher.py", line 71,
in __init__
self.itemsfile = env['SCRAPY_FEED_URI']
exceptions.KeyError: 'SCRAPY_FEED_URI'

Pablo Hoffman

unread,
May 21, 2012, 1:30:22 PM5/21/12
to scrapy...@googlegroups.com, Raymond Ballew
Hi Raymond,

That bug is fixed here, thanks for reporting!

Pablo Hoffman

unread,
Sep 9, 2012, 9:48:19 PM9/9/12
to scrapy...@googlegroups.com, Raymond Ballew
Install package scrapyd-0.15 (apt-get install scrapyd-0.15)

On Sun, Sep 9, 2012 at 3:19 PM, Jonathan Rhone <rho...@gmail.com> wrote:
Hey,

How do I go about installing scrapyd on an ubuntu server with this change?

Thanks,

Jon

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/scrapy-users/-/fyAKbZT8z34J.

To post to this group, send email to scrapy...@googlegroups.com.
To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

limca

unread,
Nov 19, 2012, 1:13:02 PM11/19/12
to scrapy...@googlegroups.com
Ahh I found the answer to my question - I deleted the value of item_dir in Scrapyd settings file, removed and redployed the project folders to Scrapyd.  Now the files generated in csv format are stored in the Twisted folder. 

On Sunday, November 18, 2012 9:18:12 PM UTC-5, limca wrote:
Hi Mohshin, Pablo,

I am using Scrapyd on windows and in my project's settings file I have defined FEED_URI as file.csv and FEED_FORMAT = csv. However it seems like these settings are ignored and the output is available in json lines format only on the items link.  Any ideas on how I can get the output written to the csv file?  Also Mohsin, when you say the output is stored locally, what format are you saving it in?  Any input will be appreciated.

Regards,
Aniket

On Saturday, November 17, 2012 7:14:58 AM UTC-5, Mohsin Hijazee wrote:
Hi Pablo,
  I am using scrapy 0.17 from Ubuntu packages and I have pointed the FEED_URI to S3 but it only works locally and not on remote deployment. Any clue about that?

Thanks in advance

Mohsin Hijazee

unread,
Nov 20, 2012, 6:24:44 AM11/20/12
to scrapy...@googlegroups.com
Limca,
  Glad that it works for you but for me on Ubuntu, that's not the case. Removing items_dir settings from /etc/scrapyd/config.d/000-default makes no job to run at all.


Pablo,
  I note that SCRAPY_FEED_URI is being picked up from environment as can be seen here. Can you please tell me how that would work for a project that is deployed as an egg? I am going through the source and if I get the answer, will post it here.

Pablo Hoffman

unread,
Dec 28, 2012, 12:38:01 PM12/28/12
to scrapy...@googlegroups.com
Mohsin, instead of removing it, try setting it to an empty value, like so:

items_dir =


--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/scrapy-users/-/-0C0jg3UhZ4J.

Devin Shields

unread,
Mar 4, 2014, 4:44:31 PM3/4/14
to scrapy...@googlegroups.com
I'm having the same issue, any help would be appreciated.

On Tuesday, April 9, 2013 4:14:00 AM UTC-4, Arik wrote:
I tried to do the following and I can't seem to get it to work,
I have a working S3 setting (tested locally) and the items_dir set to empty, but scrapyd keeps saving the items locally.
Any help would be appreciated.

Juraj Variny

unread,
Nov 8, 2014, 3:48:42 PM11/8/14
to scrapy...@googlegroups.com
Hi all, this simple patch fixes it:

https://github.com/scrapy/scrapyd/pull/67


Reply all
Reply to author
Forward
0 new messages