Single Scrapy Project vs. Multiple Projects for Various Sources

५१८ भ्यु
नपढिएको पहिलो सन्देशमा जानुहोस्

Andre King

नपढिएको,
२०१७ मार्च २९, १३:०६:३७१७/३/२९
प्रापक scrapy-users
Hello Scrapy Users!

For scraping various sources (e.g. Stack Overflow, Wikipedia, Github, etc.), is it advised to put all spiders under a single project or multiple scrapy projects?

Thanks,
Andre

Jakob de Maeyer

नपढिएको,
२०१७ मार्च ३०, ०४:३२:४११७/३/३०
प्रापक scrapy-users
Hey Andre,

whether spiders should go into the same project is mainly determined by the type of data they scrape, and not by where the data comes from.

Say you are scraping user profiles from all your target sites, then you may have an item pipeline that cleans and validates user avatars, and one that exports them into your "avatars" database. It makes sense to put all spiders into the same project. After all, they all use the same pipelines because the data always has the same shape no matter where it was scraped from. On the other hand, if you are scraping questions from Stack Overflow, user profiles from Wikipedia, and issues from Github, and you validate/process/export all of these data types differently, it would make more sense to put the spiders into separate projects.

In other words, if your spiders have common dependencies (e.g. they share item definitions/pipelines/middlewares), they probably belong into the same project; if each of them has their own specific dependencies, they probably belong into separate projects.


Cheers,
-Jakob

Lhassan Baazzi

नपढिएको,
२०१७ मार्च ३०, ११:०२:३३१७/३/३०
प्रापक scrapy...@googlegroups.com
Hi 

I found Jakob answer useful thank you for sharing it :)

How about others? I think it will be a good discussion.


Best Regards.
Lhassan.



--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscribe@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Andre King

नपढिएको,
२०१७ अप्रिल २, २२:२०:५३१७/४/२
प्रापक scrapy-users
Found this post from a while back that has similar discussion. https://groups.google.com/forum/#!topic/scrapy-users/hEvpgppVv9s

Andre


On Thursday, March 30, 2017 at 8:02:33 AM UTC-7, Lhassan Baazzi wrote:
Hi 

I found Jakob answer useful thank you for sharing it :)

How about others? I think it will be a good discussion.


Best Regards.
Lhassan.


Le 30 mars 2017 10:32, "Jakob de Maeyer" <jako...@gmail.com> a écrit :
Hey Andre,

whether spiders should go into the same project is mainly determined by the type of data they scrape, and not by where the data comes from.

Say you are scraping user profiles from all your target sites, then you may have an item pipeline that cleans and validates user avatars, and one that exports them into your "avatars" database. It makes sense to put all spiders into the same project. After all, they all use the same pipelines because the data always has the same shape no matter where it was scraped from. On the other hand, if you are scraping questions from Stack Overflow, user profiles from Wikipedia, and issues from Github, and you validate/process/export all of these data types differently, it would make more sense to put the spiders into separate projects.

In other words, if your spiders have common dependencies (e.g. they share item definitions/pipelines/middlewares), they probably belong into the same project; if each of them has their own specific dependencies, they probably belong into separate projects.


Cheers,
-Jakob


On Wednesday, March 29, 2017 at 7:06:37 PM UTC+2, Andre King wrote:
Hello Scrapy Users!

For scraping various sources (e.g. Stack Overflow, Wikipedia, Github, etc.), is it advised to put all spiders under a single project or multiple scrapy projects?

Thanks,
Andre

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
सबैलाई जवाफ पठाउनुहोस्
लेखकलाई जवाफ
फर्वार्ड गर्नुहोस्
0 नयाँ म्यासेजहरू