error in running scrapy tutorial

4,114 views
Skip to first unread message

Republic

unread,
Feb 25, 2010, 1:43:01 AM2/25/10
to scrapy-users
Hi,

I am new to Python and Scrapy and I was just trying out Scrapy's
tutorial and I encountered some beginners problems.
I have installed Scrapy on a Windows platform according to the
installation guide
a) installed python 2.6.4 windows installer
b) installed Twisted for Windows -
c) installed pywin32
d) installed libxml2 for Windows
e) installed PyOpenSSL for Windows
f) installed Scrapy-0.8.win32.exe for windows
I have also did the set system path thing to C;/python/scripts

but I receive the error
----------------------------------------------------------
IDLE 2.6.4
>>> python scrapy-ctl.py startproject dmoz
SyntaxError: invalid syntax
>>>

-----------------------------------------------------------
seems like it is unable to find scrapy-ctl.py

I then went on to try to set CLASSPATH to

.;C:\Program Files\Java\jre6\lib\ext\QTJava.zip;C:\Python26\Lib\site-
packages\scrapy\templates\project\scrapy-ctl.py


Hope someone can advise?


FM

Andres Moreira

unread,
Feb 25, 2010, 5:34:05 AM2/25/10
to scrapy...@googlegroups.com
Hi Republic,


> but I receive the error
> ----------------------------------------------------------
> IDLE 2.6.4
> >>> python scrapy-ctl.py startproject dmoz
> SyntaxError: invalid syntax
> >>>

You're doing this inside the python interpreter, you must do it from
windows command line.

Open a Command Line from windows menu, and type:

C:\> python scrapy-ctl.py startproject dmoz

Andres.

> -----------------------------------------------------------
> seems like it is unable to find scrapy-ctl.py
>
> I then went on to try to set CLASSPATH to
>
> .;C:\Program Files\Java\jre6\lib\ext\QTJava.zip;C:\Python26\Lib\site-
> packages\scrapy\templates\project\scrapy-ctl.py
>
>
> Hope someone can advise?
>
>
> FM
>

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

Republic

unread,
Feb 25, 2010, 11:36:32 PM2/25/10
to scrapy-users
Hi Andres,

ok. I tried out your recommendation and did it in the windows command
line;

I typed the following in

C:\> python scrapy-ctl.py startproject paul_smith

I got the following reply:

python: can't open file 'scrapy-ctl.py: [error 2] No such file or
directory

I have checked and scrapy-ctl.py is found in c:\Python26\scripts

hope you can enlighten me? or anyone else can share?


FM

Foo Meng Ng

unread,
Feb 26, 2010, 1:17:30 AM2/26/10
to scrapy-users
Hey andres,

I think I solved it....

instead of typing

C:\> python scrapy-ctl.py startproject dmoz

I typed

C:\> scrapy-ctl.py startproject dmoz

and it worked.

For some reason I was able to do it in windows command line without
typing python.

Not sure why?

FM

--
Ng Foo Meng

Billy Mac

unread,
Feb 26, 2010, 7:50:28 PM2/26/10
to scrapy-users
Republic your not the only one,

I have to say I've been struggling with this as well since yesterday,
I can't seem to get it to create that dmoz directory either.
The only way I was able to get it to to do anything was by using the
Windows PowerShell (x86) - C:\Users\Home> scrapy-ctl.py startproject
dmoz
when I hit enter...a small command prompt looking window popped up for
a second or two,

But the files it created are far from being organized...(unless that's
how their supposed to be, but I doubt it) there are new files spread
out throughout multiple directories where I had say 2 of the same
files in places, I now have 3 and with the exact date and time that I
created them through Windows PowerShell about an hr. or so ago. I
tried the regular windows command prompt first without success, tried
to enter it in the "run" command. Nothing worked, and what I was able
to create looks nothing like the nicey nice folder of files it's
supposed to create described below from the tutorial. lol

Before start scraping, you will have set up a new Scrapy project.
Enter a directory where you’d like to store your code and then run:
python scrapy-ctl.py startproject dmoz
This will create a dmoz directory with the following contents:
dmoz/
scrapy-ctl.py
dmoz/
__init__.py
items.py
pipelines.py
settings.py
spiders/
__init__.py
...
-------------------------------------------
I tried creating a dmoz folder in: python26\scripts....no good
Tried creating a blank file dmoz.py....no good


I have done everything exactly as Republic described above with the
exception of having my pythonpath in Environment Variables - System
variables at: PATH C:\Python26\Scripts (tried it also in Users Home
Variables with no luck) and I thought it would work in the Users Home
Variables because in command prompt and powershell it starts with: C:
\Users\Home>

I have windows 7 ultimate x64 (maybe there's a bug with win7, I don't
know, probably operator error though I'm thinkin...lol), I downloaded
the python v2.6.4 x86 and I can't seem to get going with this. I am a
beginner as well and have been using iMacros for the last month or so,
but am growing extremely frustrated with how slow it is. I have read
that there are far better and efficient ways to do this with Python
instead of iMacros, and after reading how easy it looked like it could
be on the Scrapy home page I decided to give it a try, so I am really
eager to learn and get started with this and hopefully it's something
stupid simple I forgot to do.

Anyone could point us in the right direction I'd really appreciate it,
thanks in advance,
BillyMac.

PS tried Foo Meng's way too, no good

Rolando Espinoza La Fuente

unread,
Feb 26, 2010, 9:36:35 PM2/26/10
to scrapy...@googlegroups.com
Take note that the examples and tutorial commands are
for shell. bash/sh in linux/unix-like system and cmd/powershell in windows

To be sure that python is accesible via command line just open
your terminal/shell/cmd/powershell and type: python
and you will see the python interpreter message
ending with "python prompt": >>>

And everything that you see in the tutorial or examples
with >>> are supposed to be typed in the python interpreter

e.g.

c:\> python
Python 2.5.2
[ messages here ]
Type "help", etc
>>>

And to be sure that you have Scrapy installed in your python environment,
in the python interpreter type: import scrapy
and no error should be raised

e.g.

>>> import scrapy
>>>

If you get something like

>>> import scrapy
Traceback (...)
...
ImportError: No module named scrapy

That means that you not have installed Scrapy properly in
your python environment.

About the project structure, I will try to explain what are the files

> dmoz/ <- this is your project root
>    scrapy-ctl.py <- this is the "scrapy command" for your project
>    dmoz/ <- this is the "project package", python meaning
>        __init__.py <-
>        items.py <- All the .py files are python scripts related to your project
>        pipelines.py
>        settings.py
>        spiders/ <- This directory will contain the spiders that you create
>            __init__.py

In the tutorial when you see "python scrapy-ctl.py ..." is supposed
to be run inside your project root
For the dmoz example must be inside the first "dmoz" directory

e.g.

c:\> cd dmoz
dmoz> python scrapy-ctl.py crawl dmoz.org

If you don't created the dmoz project you need to locate the scrapy-ctl.py
where Scrapy is installed and run "python scrapy-ctl startproject dmoz".

In short "python some_file.py" means that you need to be in the directory
where "some_file.py" is located.

That point isn't clear in the tutorial, I think it's because the
tutorial it's written
in a platform-agnostic way and assumes a minimal knowledge of python and
how to run commands in the command line.

I hope that helps. Anyway I'm not native english speaker :)

You should learn more about python, because Scrapy is a scraping *for python*
and you need really to understand how to program in python.

In the net are plenty results about python progrmaming, you can start
in the official getting started guide:
http://www.python.org/about/gettingstarted/

And also the tutorial suggest some resources:
http://www.diveintopython.org/
http://wiki.python.org/moin/BeginnersGuide/NonProgrammers

Regards,

Rolando

Rolando Espinoza La Fuente

unread,
Feb 26, 2010, 9:51:41 PM2/26/10
to scrapy...@googlegroups.com
On Fri, Feb 26, 2010 at 2:17 AM, Foo Meng Ng <ngfo...@gmail.com> wrote:
> Hey andres,
>
> I think I solved it....
>
> instead of typing
>
> C:\> python scrapy-ctl.py startproject dmoz
>
> I typed
>
> C:\> scrapy-ctl.py startproject dmoz
>
> and it worked.
>
> For some reason I was able to do it in windows command line without
> typing python.
>
> Not sure why?

Maybe that's a weak point in the tutorial.

There will be two "scrapy-ctl.py":

- the installed "globally". And usually you use this for the
"startproject" command
and if scrapy is installed correctly you can access the command from anywhere

- the other is the one located in your project directory, that
directory created by "startproject"

The difference?

The scrapy-ctl.py located inside your project directory "knows about
the project". And any
command related to your project should be run with this script.

For example, try running the global scrapy-ctl.py

C:\> scrapy-ctl.py genspider

You will see the error:

Error running: scrapy-ctl.py genspider
Cannot find project settings module in python path: scrapy_settings

As you can see, there is a "settings module", the one located inside
your project directory.
e.g. dmoz\dmoz\settings.py

If you try the "project's scrapy-ctl.py" you will see different result:

c:\> cd dmoz
c:\dmoz> python scrapy-ctl.py genspider
Scrapy 0.8 ....
[ lot of text here]
Available commands
... etc

You must use the "project's scrapy-ctl.py" to create spiders, run
spiders, crawl domains,
start the interactive shell, etc.

e.g.

c:\dmoz> python scrapy-ctl.py genspider target_spider target.com
Created spider 'target.com' using template 'crawl' in module:
images.spiders.target_spider

The "genspider" command creates a spider inside your "project module" using
a preset template.

Just check the newly created spider that -for dmoz example- is located in
c:\dmoz\dmoz\spiders\target_spider.py

You can see how scrapy "detect" your spiders
c:\dmoz> python scrapy-ctl.py list
target.com

And you can run your spider
c:\dmoz> python scrapy-ctl.py crawl target.com
[lot of text]
2010-02-26 22:49:24-0400 [target.com] INFO: Spider opened
[lot of text]

From there you just need to program your spider, you may need to know
about xpath
expressions, text extractions, etc.

Regards,

Rolando

Billy Mac

unread,
Feb 27, 2010, 6:37:39 AM2/27/10
to scrapy-users
Hey Rolando,

I really appreciate you taking the time to reply.

I feel like an idiot now!.....Duh.........lol

Last nite after watching some videos on the very basics, I kinda
figured out that the trouble I was having was because I hadn't
imported scrapy, like you have to do for any module, I thought I could
just wing it, and follow along with that tutorial and have a working
spider to crawl sites for me when I was done......WRONG! lol

That tutorial assumes you know the very basics of Python

I got way ahead of myself and overlooked the very basic of basics in
python because I wanted to run....before I first learned how to
walk....lol
And I sincerely apologize for that and wasting everyone's time because
I didn't bother to read and learn the very basics of Python
first...Sorry about that,

As I said earlier I have been using iMacros for scraping certain urls
from sites however it's extremely slow and my lack of knowledge for
any programming language hampers me big time. I'm so limited in what I
can do because I don't know the very basics of any programming
language. If I could incorporate vbs into my macro scripts I might be
able to get them to run better, and the reason why I started
researching a programming language to learn...I figured I'd start with
Python as I read alot of people asking similar questions on which
language to start with and Python always came up as a good first
language to learn.

This was a lesson well learned!....Don't Overlook The Very Basics!

I have found a series of videos of about 40 some videos in all for the
very basics of Python...As soon as I got to the 4th. video on "Modules
and Functions" I knew right away...that was why I was having problems
and felt like putting my foot in my mouth after that last post...lol

Anyone else having similar problems might want to go thru this series
on the basics, (It all makes sense now) you can find them here at
youtube this one starts with the fist one out of over 40 videos:
http://www.youtube.com/watch?v=4Mf0h3HphEA&feature=SeriesPlayList&p=EA1FEF17E1E5C0DA

I appreciate all the input on this, thanks alot guys,
BillyMac.

Foo Meng Ng

unread,
Feb 28, 2010, 4:32:30 AM2/28/10
to scrapy...@googlegroups.com
Hi Rolando,

Yeah many thanks for the inputs, I am most grateful. Same sentiments
as Billy. I am trying to fly before I can even start crawling.

I am going through the python tutorials as we speak now. Many thanks.

But indeed Scrapy is a good tool and I will want to use it for my
school's research project.

FM

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.
>
>

--
Ng Foo Meng

Reply all
Reply to author
Forward
0 new messages