scrapyd and __file__ (pkgutil.get_data)

689 views
Skip to first unread message

Thimo Brinkmann

unread,
Feb 16, 2013, 7:55:17 PM2/16/13
to scrapy...@googlegroups.com
Hey guys,

I am having a really hard time deploying to scrapyd because of the static configuration files (lets call them dictionary files) that need to be used).
Played around with MANIFEST.in, pkgutil.get_data, StringIO and whatever I could find for now more than 4 hours, but really can't get it work. Has anyone a working example?

I want to load two static JSON files from scrapyd, but it never seemed to find the files, whatever referencing method I used. Normally I had the open method followed by the filename and I put the filename in the eggs root as well as the projects egg folder, but in no case the files were found. If anyone knows how to do this with a full example, I would be very helpful.


Kind regards,
Thimo

Pablo Hoffman

unread,
Mar 14, 2013, 12:52:46 PM3/14/13
to scrapy...@googlegroups.com
This is a snippet I've used to make deploy work with static files:

from setuptools import setup, find_packages

setup(
    name='project',
    version='1.0',
    packages=find_packages(),
    package_data={
        'myproject': ['conf/*.json', 'conf/cities.txt'],
    },
    entry_points={
        'scrapy': ['settings = myproject.settings']
    },
    zip_safe=False,
)

The conf/ dir is inside the myproject folder.

And then you use myproject.__file__ to access the folder, on your code.

Pablo.



--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

David McClure

unread,
Mar 31, 2014, 1:18:20 PM3/31/14
to scrapy...@googlegroups.com
Hi Pablo,

First of all, thanks so much for all the work on Scrapy! It's a fantastic tool. Anyway, I'm actually running into this problem too - I'm writing a crawler that needs to read an XML configuration file, which works fine when the spider is run from the command line. But, when deployed the Scrapyd, it's as if the configuration file doesn't exist. My setup.py file looks like this:

```
from setuptools import setup, find_packages

setup(
    name = 'myproject',
    version = '0.1',
    packages = find_packages(),
    package_data = {
        'myproject': ['content/*.xml', 'content/category/*.xml']
    },
    entry_points = {
        'scrapy': ['settings = myproject.settings']
    },
    zip_safe = False
)
```

Where /content is inside the /myproject module. And, when I manually inspect the built package that deploys the Scrapyd, the XML files are included. Any idea what could be going on here?

Thanks!
David

Pablo Hoffman

unread,
Apr 19, 2014, 10:53:46 AM4/19/14
to scrapy-users
Are you using pkgutil.get_data()?. What exception do you get?


Ciprian Lucaci

unread,
Nov 25, 2016, 9:00:52 AM11/25/16
to scrapy-users
I have run into the same problem.
I included the files in the setup.py as in the example but I am not able to load the data with pkgutil.get_data().

Can someone please post a sample code for that?

I am deploying the spiders on scrapyd and I want to read all the .json files in conf/

Thanks.

Rolando Espinoza

unread,
Nov 25, 2016, 11:00:35 AM11/25/16
to scrapy...@googlegroups.com
When you use package_data in setup.py, the recommended way to access the files is using pkg_resources:

>>> import pkg_resources
>>> pkg_resources.resource_filename('myproject', 'conf/cities.txt')
'/path/to/package/conf/cities.txt'

Note: your package name should match the key you use in package_data.

Regards,

Rolando

To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscribe@googlegroups.com.

To post to this group, send email to scrapy...@googlegroups.com.

Nikolaos-Digenis Karagiannis

unread,
Nov 26, 2016, 4:13:23 AM11/26/16
to scrapy-users
Are you going to parse the json files in python?
If you do, just write them in native python data structures
in a python file
and skip the json.load part.
Not only you will avoid using pkgutil
but you can use python syntax
that has raw strings
and allows trailing commas.
Reply all
Reply to author
Forward
0 new messages