Where can I find a proper tutorial about scrapy

148 views
Skip to first unread message

ivanov

unread,
Aug 10, 2015, 11:12:42 AM8/10/15
to scrapy-users
Hi ...

I think Scrapy is a very interesting tool. But it lacks proper documentations and tutorials.

It is very hard to find blogs that write about Scrapy. 7 years after it's released, not a single book has been published.  And the worst is the official docs in scrapy.org is very frustating and hard to understand. There are so many theory, but so little example.

Can anyone tell me, where can I find tutorials about Scrapy? Or maybe Scrapy for dummies?

Jakob de Maeyer

unread,
Aug 11, 2015, 4:10:10 AM8/11/15
to scrapy-users
Hey Ivanov,

Scrapy is actually a really well documented project. What part of the docs do you struggle with? The Scrapy Tutorial and Scrapy at a Glance pages have plenty of example code and explanations for beginners. Most of the more advanced topics have many examples as well. There is a list of external resources in the GitHub wiki, and you'll also find a few external tutorials and videos by googling "Scrapy Tutorial".


Cheers,
-Jakob

ivanov

unread,
Aug 11, 2015, 6:20:13 AM8/11/15
to scrapy-users
Hi Jakob :) Thank you for your response.

Yes, The "Scrapy Tutorial" and "Scrapy at a Glance" are quite easy. And there are bunch of writings about it.

But the problem starts when it comes to "Spider" section, and "Item Pipeline" section and more advance topics. The official example is hard to be understood.

If the official explanations about this "more advance" topics are easy to be understood, then how come there are almost no blogs writes about it?

Most of the blogs stop at "recursive scraping". They don't use "item pipeline" or anything.


ivanov

unread,
Aug 11, 2015, 7:24:10 AM8/11/15
to scrapy-users
Jakob, if you don't mind. Can you teach me how to use "pipelines.py" ?

tricia....@gmail.com

unread,
Aug 11, 2015, 4:22:41 PM8/11/15
to scrapy-users
Hi Ivanov,

I think my position is probably similar to yours: new to Scrapy and interested in using it for a project more than in understanding Scrapy itself. I found this youtube video quite helpful as an extension of the Scrapy Tutorial that Jakob mentions. The video is not perfect, and it will require viewing several times over to get content out of, but it addresses a more complex test case than the official tutorial, which helped me start to get a handle on pipelines and more complex processes with Scrapy.

Best,
Tricia

Asheesh Laroia

unread,
Aug 12, 2015, 12:09:57 AM8/12/15
to scrapy...@googlegroups.com
Another video that may (or may not!) help is this ~28 minute video from me at PyCon: https://www.youtube.com/watch?v=-JzH8TcwqxI

I've heard it is good, and I thought it was good, but naturally I'm biased.

The best thing about this video is the diagrams by Karen Rustad Tölva, in my opinion. They begin here: https://youtu.be/-JzH8TcwqxI?t=9m5s (but it probably makes sense to see the beginning of the video first).

I'd be very happy if someone wants to integrate these into the official Scrapy tutorial at any point.

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

ivanov

unread,
Aug 13, 2015, 5:20:01 AM8/13/15
to scrapy-users
Thank you Tricia & Asheesh. I will watch the video on Saturday or Sunday. I will give my opinion later.

ivanov

unread,
Aug 17, 2015, 7:35:46 AM8/17/15
to scrapy-users
Can  anyone teach me to use pipeline properly? Or maybe you can tell me a tutorial blog about pipeline.

Please don't recommend the official docs.

Jakob de Maeyer

unread,
Aug 17, 2015, 8:14:41 AM8/17/15
to scrapy...@googlegroups.com
Hey Ivanov,

now I'm unsure whether you received my private mail from the 11th, so
here it is again:

Hey Ivanov,

I can point you in the right direction, but really, it's all there in
the docs

Pipelines are a really easy concept: Every Item that is scraped (i.e.
yielded or returned) by the Spider is given to the process_item() method
of all pipelines. This method can then inspect and modify the item and
must do one of two things:
- if it returns the Item, it will be processed by the next pipeline, or
if there is no further pipeline, go to the feed exports (see
http://doc.scrapy.org/en/latest/intro/tutorial.html#storing-the-scraped-data)
- if it raises scrapy.exceptions.DropItem, this particular item will
stop being processed, end of story. You can use this if you want to
filter your items for certain characteristics.

There are a couple of extra methods you *can* implement if you want,
e.g. to open/close files or database connections, but literally all that
a pipeline *must* do is have a process_item() method. All methods, their
signatures, and their use cases are explained here:
http://doc.scrapy.org/en/latest/topics/item-pipeline.html#writing-your-own-item-pipeline

The most common use case for pipelines is to write scraped data to a
database. The docs have an example for MongoDB:
http://doc.scrapy.org/en/latest/topics/item-pipeline.html#write-items-to-mongodb

You can have multiple pipelines, and the items will be processed in the
order you set in your ITEM_PIPELINES setting (which you set in your
settings.py file), as explained here:
http://doc.scrapy.org/en/latest/topics/item-pipeline.html#activating-an-item-pipeline-component

Whether you need item pipelines at all really depends on what you want
to do.


Cheers,
-Jakob
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "scrapy-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/scrapy-users/ttaAatl0LCg/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> scrapy-users...@googlegroups.com
> <mailto:scrapy-users...@googlegroups.com>.
> To post to this group, send email to scrapy...@googlegroups.com
> <mailto:scrapy...@googlegroups.com>.

ivanov

unread,
Aug 21, 2015, 12:11:41 AM8/21/15
to scrapy-users
Hi Jakob :)

Thanks for your help & for your attention.

Unfortunately, I can hardly grasp the official docs. And I think I have already told you from the beginning, that the official docs is killing me :D If the docs can be undesrtood easily, I won't be here. haha...

Oya, I found a good alternative to learn "scrapy pipeline". I think this blog and forum is good for a newbie like me:

http://www.smallsurething.com/web-scraping-article-extraction-and-sentiment-analysis-with-scrapy-goose-and-textblob/

https://stackoverflow.com/questions/29946989/renaming-downloaded-images-in-scrapy-0-24-with-content-from-an-item-field-while

Thanks Guys :)


NB: SCRAPY SHOULD HIRE SOMEBODY TO RE-WRITE IT'S DOCS. IT'S VERY FRUSTATING TO READ IT.

Travis

unread,
Aug 21, 2015, 1:31:09 AM8/21/15
to scrapy...@googlegroups.com
One might suggest, since this excellent tool is proving value to you, that you either provide financial support or help with the documentation yourself.

Otherwise it would be easy to mistake your comments as coming from someone without any sense of gratitude, or perhaps an inflated sense of entitlement.


You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.

Malik Rumi

unread,
Aug 23, 2015, 10:49:17 PM8/23/15
to scrapy-users
Being somewhat impatient myself, I can understand where Ivanov is coming from, but I think this is a common reality in open source, and it is only complicated if there is a language issue. Anyone with a basic understanding of the topic (and a lot of people who don't even seem to have that) and who wants to make a name for themselves can throw up an introductory youtube video, but advanced topics require advanced understanding, and most people don't need or want that. So the 'return' for an advanced topic blog or video is smaller, because it reaches a smaller audience, an audience more likely to know a thing or two themselves and so not let a tutorial creator get away with just phoning it in. In short, you just have to deal with it. 

That said, I do recommend, and have viewed many times, both the relatively new Reddit video that Tricia talks about as well as Asheesh's. To that list let me add https://www.youtube.com/playlist?list=PL51BA5190961CFEE3, which is the only long form, multiple episode Scrapy video tutorial I've seen that talks about advanced topics like getting your results into a db, rather than stopping at telling you it can be done. 

Nevertheless, as a wise person recently told me, you just have to let it fly and see what happens. You learn a lot that way, and it will be directly applicable to your specific projects. And if you then go back to the official docs, I think you will find after that experience that the docs start to make more sense. 
Reply all
Reply to author
Forward
0 new messages