Re: Scrapy book project

Tsouras

unread,

Mar 6, 2013, 11:46:28 AM3/6/13

to scrapy...@googlegroups.com

Great idea!

You should add a chapter on how to avoid being banned (using delay, tor etc) and one more for scrapyd

On Tuesday, March 5, 2013 11:56:47 PM UTC+2, Balthazar Rouberol wrote:

Hello all,

I've recently been offered to write a 50-70 pages beginner's guide on Scrapy which would be aimed at non
Python users and web developers who are building apps relying on data scraped from the web.

I would be happy to know what the Scrapy community thinks about this project, and to hear any of your ideas!

I'm still working on the outline, but so far, I'm thinking about the following:
• Install + development environment setup
• Writing first project
• Define the data you want to collect
• Scraping a template based website
• Exporting data to generic formats (XML, JSON, CSV, JSON, YAML, etc) and writing your own item exporter
• Storing the data into database (mongoDB?, PostgreSQL?, ...)
• Scraping an RSS/ATOM feed
• Scraping a highly consulted website
• Writing a middleware to limit scraping to recent pages (aka; avoid scraping a static page multiple times)
• Scraping AJAX pages / using AJAX calls as an API
• Scraping JS-generated pages (ghost.py? webkit?)
• Scraping in the cloud (I'm thinking Scrapy Cloud and Heroku or OpenShift)

If you think I've missed a major point, if you have any questions and/or suggestions, I'd be happy to hear it.

Cheers

--
B

Sidnei Pereira

unread,

Mar 8, 2013, 7:00:48 AM3/8/13

to scrapy...@googlegroups.com

I'd suggest the same as Tsouras.

On Thursday, March 7, 2013 4:28:13 AM UTC-3, 郭冬冬 wrote:

phantomjs is a good idea to scrape js-pages

在 2013年3月6日星期三UTC+8上午5时56分47秒，Balthazar Rouberol写道：

Espen Klem

unread,

Mar 8, 2013, 7:04:27 AM3/8/13

to scrapy...@googlegroups.com

I'm no programmer and think you may want to add a step between:

• Install + development environment setup
• Writing first project

My biggest threshold/obstacle was that I jumped right to writing a project instead of playing with debugging methonds. So in python-shell (or did Scrapy has it's own shell?) try to extract one xpath-thingy and see what the output was.

Good luck!

Best regards,
Espen

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Espen Klem

unread,

Mar 8, 2013, 7:10:25 AM3/8/13

to scrapy...@googlegroups.com

I guess I'm thinking about this: http://doc.scrapy.org/en/0.16/topics/shell.html#topics-shell

When I tried last time was around version 0.8, and if I remember correctly there were a bit more on this topic.

Still best regards =)

Espen Klem

gen chen

unread,

Mar 8, 2013, 8:27:21 AM3/8/13

to scrapy...@googlegroups.com

how about introduce the scrapyd and performance optimization. thanks

Neverlast N

unread,

Mar 9, 2013, 7:49:54 AM3/9/13

to scrapy...@googlegroups.com

Hello, I've been working on a similar book the last 6-7 months with a major publisher. I hope this is not a problem :) I'm sure there is enough space for all of us

Cheers,

Dimitris

Date: Fri, 8 Mar 2013 21:27:21 +0800
Subject: Re: Scrapy book project
From: chenge...@gmail.com
To: scrapy...@googlegroups.com

Balthazar Rouberol

unread,

Mar 9, 2013, 3:48:35 PM3/9/13

to scrapy...@googlegroups.com

Hi all,

Thanks for your all your kind replies and suggestions.

A broad introduction to the scrapy shell clearly is a must-have for the reader to be able to test his/her own code.

I planned to talk about how to avoid being banned in the "Scraping a highly consulted website", mabe taking the example of Wikipedia, or a similar one.

Thanks again
Balthazar

Caco Caqueira

unread,

Mar 10, 2013, 12:32:41 PM3/10/13

to scrapy...@googlegroups.com

Personally, I think the official documentation is a great start. They even release it in epub format! Why not simply improve it?

2013/3/9 Balthazar Rouberol <roube...@gmail.com>

--

Neverlast N

unread,

Mar 10, 2013, 3:47:27 PM3/10/13

to scrapy...@googlegroups.com

A book or two will give lots of publicity to Scrapy and make it more available to people who wouldn't have access to it otherwise. I think we are missing that right now. :)

That's what I feel. Official documentation is great... but it's a long way to get there through Amazon without any books on Scrapy out there ;)

From: ca...@jusbrasil.com.br
Date: Sun, 10 Mar 2013 13:32:41 -0300

Subject: Re: Scrapy book project

To: scrapy...@googlegroups.com

Jerry Wu

unread,

Jun 12, 2014, 8:57:04 AM6/12/14

to scrapy...@googlegroups.com

Hello, Balthazar

Great idea. I am new to scrapy and now struggling to start my first web project. Could you let me know if it is available now? I am surely looking forward to it.

shahidashraff

unread,

Jun 15, 2014, 1:20:06 PM6/15/14

to scrapy...@googlegroups.com

Hi

You can add the topic on distributed crawling.

that would be gud, i can help u

Sidnei Pereira

unread,

Jun 16, 2014, 11:03:29 AM6/16/14

to scrapy...@googlegroups.com

+1 for distributed crawling

Aaron Tao

unread,

Oct 29, 2014, 7:28:26 AM10/29/14

to scrapy...@googlegroups.com

+1 for distributed crawling!

On Monday, June 16, 2014 11:03:29 PM UTC+8, Sidnei Pereira wrote:

+1 for distributed crawling

Reply all

Reply to author

Forward