0 spider when deploy with scrapyd / scrapyd-client

409 views
Skip to first unread message

Arnaud Knobloch

unread,
Feb 16, 2017, 2:01:15 PM2/16/17
to scrapy-users
Hi there,

I created my first scrapy project. I have an Ubuntu 16.04 server. I installed scrapyd and scrapyd-client with pip (depency problems with apt-get).
When I deploy, there is no spider available...

scrapyd-deploy tre -p m_scrapy
fatal: No names found, cannot describe anything.
Packing version r14-master
Deploying to project "m_scrapy" in http://IP:6800/addversion.json
Server response (200):
{"status": "ok", "project": "m_scrapy", "version": "r14-master", "spiders": 0, "node_name": "Tre"}

fatal: No names found, cannot describe anything. --> Seems not important. I'm using version = GIT in my scrapy.cfg and I don't have annotated tag.

curl http://IP:6800/schedule.json -d project=m_scrapy -d spider=m_scrapy
{"status": "error", "message": "spider 'm_scrapy' not found"}

{"status": "ok", "projects": ["m_scrapy"], "node_name": "Tre"}

{"status": "ok", "spiders": [], "node_name": "Tre"}

Scrapy.cfg

[settings]
default = m_scrapy.settings

[deploy:local]
project = m_scrapy
version = GIT

[deploy:tre]
project = m_scrapy
version = GIT

Setting.py

BOT_NAME = 'm_scrapy'

SPIDER_MODULES = ['m_scrapy.spiders']
NEWSPIDER_MODULE = 'm_scrapy.spiders'

ITEM_PIPELINES = {
    'm_scrapy.pipelines.MPhoneImagePipeline':100,
    'm_scrapy.pipelines.MAdItemPipeline':200,
    'm_scrapy.pipelines.MAdImagesPipeline':300
}

MPHONEIMAGEPIPELINE_IMAGES_URLS_FIELD = 'phone_image_url'
MPHONEIMAGEPIPELINE_RESULT_FIELD = 'phone_image'

COOKIES_DEBUG = True
LOG_ENABLED = True
LOG_LEVEL = 'WARNING'
LOG_STDOUT = False
LOG_FILE = "%s_%s.log" % (BOT_NAME, time.strftime('%d-%m-%Y'))
IMAGES_EXPIRES = 0
MIMAGESPIPELINE_IMAGES_EXPIRES = 0


When I'm using on my computer the crawler work fine.
I already tried to delete project-egg.info, setup.py and the build folder.

Another question is: I don't have any .egg file in build, is it normal? 

Thanks!


Nikolaos-Digenis Karagiannis

unread,
Feb 17, 2017, 11:08:18 AM2/17/17
to scrapy-users
To keep the egg, you have to pass the debug argument to scrapyd-deploy.
Try it and look if the spider is put in the egg at all.
Does the command `scrapy list` work?

Arnaud Knobloch

unread,
Feb 18, 2017, 5:05:21 AM2/18/17
to scrapy-users
Hi,

scrapyd-deploy tre -p m_scrapy -d
fatal: No annotated tags can describe '852e945dcf15dc47652aa0164511bb36e9dd7ae3'.
However, there were unannotated tags: try --tags.
Packing version r15-master
Deploying to project "m_scrapy" in http://IP:6800/addversion.json
Server response (200):
{"status": "ok", "project": "m_scrapy", "version": "r15-master", "spiders": 0, "node_name": "Tre"}

Output dir not removed: /var/folders/p3/tbhnqdq9019dbfwjtkrbb50r0000gn/T/scrapydeploy-19AIrF

Where is the egg? I don't find anything.

The command scrapy list return the name of my spider.

Arnaud Knobloch

unread,
Feb 18, 2017, 5:19:55 AM2/18/17
to scrapy-users
I deleted the folder tbhnqdq9019dbfwjtkrbb50r0000gn and tried again. This time it was:

Server response (200):
{"status": "ok", "project": "m_scrapy", "version": "r15-master", "spiders": 0, "node_name": "Tre"}

Output dir not removed: /tmp/scrapydeploy-njQRX9

I went to this folder, rename the .egg to .zip, extract it and I can see my spider in spiders folder. But again it say "spiders": 0

Arnaud Knobloch

unread,
Feb 19, 2017, 1:03:10 PM2/19/17
to scrapy-users
I tried severals things but I didn't find a way to deploy my spider. Using curl http://IP:6800/addversion.json give me 0 spider too. 

Here some more info if someone can help me.

Stderr

'build/scripts-2.7' does not exist -- can't clean it
zip_safe flag not set; analyzing archive contents...

Stout
copying build/lib/m_scrapy/pipelines.py -> build/bdist.macosx-10.11-x86_64/egg/m_scrapy
copying build/lib/m_scrapy/settings.py -> build/bdist.macosx-10.11-x86_64/egg/m_scrapy
creating build/bdist.macosx-10.11-x86_64/egg/m_scrapy/spiders
copying build/lib/m_scrapy/spiders/__init__.py -> build/bdist.macosx-10.11-x86_64/egg/m_scrapy/spiders
copying build/lib/m_scrapy/spiders/m_spider.py -> build/bdist.macosx-10.11-x86_64/egg/m_scrapy/spiders
byte-compiling build/bdist.macosx-10.11-x86_64/egg/m_scrapy/__init__.py to __init__.pyc
byte-compiling build/bdist.macosx-10.11-x86_64/egg/m_scrapy/items.py to items.pyc
byte-compiling build/bdist.macosx-10.11-x86_64/egg/m_scrapy/pipelines.py to pipelines.pyc
byte-compiling build/bdist.macosx-10.11-x86_64/egg/m_scrapy/settings.py to settings.pyc
byte-compiling build/bdist.macosx-10.11-x86_64/egg/m_scrapy/spiders/__init__.py to __init__.pyc
byte-compiling build/bdist.macosx-10.11-x86_64/egg/m_scrapy/spiders/m_spider.py to m_spider.pyc
creating build/bdist.macosx-10.11-x86_64/egg/EGG-INFO
copying project.egg-info/PKG-INFO -> build/bdist.macosx-10.11-x86_64/egg/EGG-INFO
copying project.egg-info/SOURCES.txt -> build/bdist.macosx-10.11-x86_64/egg/EGG-INFO
copying project.egg-info/dependency_links.txt -> build/bdist.macosx-10.11-x86_64/egg/EGG-INFO
copying project.egg-info/entry_points.txt -> build/bdist.macosx-10.11-x86_64/egg/EGG-INFO
copying project.egg-info/top_level.txt -> build/bdist.macosx-10.11-x86_64/egg/EGG-INFO
creating '/var/folders/p3/tbhnqdq9019dbfwjtkrbb50r0000gn/T/scrapydeploy-4p418b/project-1.0-py2.7.egg' and adding 'build/bdist.macosx-10.11-x8$
removing 'build/bdist.macosx-10.11-x86_64/egg' (and everything under it)

Nikolaos-Digenis Karagiannis

unread,
Mar 8, 2017, 5:51:03 AM3/8/17
to scrapy...@googlegroups.com
Hi,

Sorry for the late reply.
It looks like the spider is indeed packaged in the egg.
There's a bug when definining LOG_STDOUT=True
but this is not the case for you.
Does it work if you remove the line `LOG_FILE =...`?

--
You received this message because you are subscribed to a topic in the Google Groups "scrapy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scrapy-users/JlfoQ5LlAl4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scrapy-users+unsubscribe@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Arnaud Knobloch

unread,
Mar 8, 2017, 7:29:23 AM3/8/17
to scrapy-users
If finally found what was the issue. If this can help someone else.

I have "import pytesseract" in my spider file. I forgot to install this on my server. In this case, when there is an import error from the spider file (pytesseract or pil or whatever), you'll deploy 0 spider and you don't have any logs. If it's an error import somewhere else (pipelines for example) you'll have some logs to help you.
To unsubscribe from this group and all its topics, send an email to scrapy-users...@googlegroups.com.

Nikolaos-Digenis Karagiannis

unread,
Mar 8, 2017, 11:20:30 AM3/8/17
to scrapy...@googlegroups.com
Hi,

This is definitely a bug
and if it works as you just described
there should be many users stumbling on it.

Thank you for the update
and for discovering it.

Do you mind reporting it in the issue tracker?
https://github.com/scrapy/scrapyd

To unsubscribe from this group and all its topics, send an email to scrapy-users+unsubscribe@googlegroups.com.

Paul Tremberth

unread,
Mar 9, 2017, 6:36:40 AM3/9/17
to scrapy...@googlegroups.com
Hello Arnaud,

what version of Scrapy are you using?
1.3.0 has this change: https://github.com/scrapy/scrapy/pull/2433 that swallows exceptions on spider modules import,
and which is very unfortunately not mentioned in the release notes (totally my bad, I missed it)

It looks like this change was a bad move since users do rely on import errors for `scrapy list` for example.
And this seems the case here for scrapyd especially.
Could you check with scrapy 1.2.2?

There is this proposal change to have the warning only optional,
and by default for some scrapy commands which do not need to load spiders (e.g. scrapy version)

If it gets approved, I think I'll backport it to 1.3 branch.

Best,
Paul.




--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscribe@googlegroups.com.

mbkv

unread,
Apr 27, 2017, 11:29:35 AM4/27/17
to scrapy-users
Hello guys,


regards,
Mani

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages