Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
multi pipelines
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
vitsin  
View profile  
 More options Nov 24 2010, 11:19 pm
From: vitsin <vitaly.sinit...@gmail.com>
Date: Wed, 24 Nov 2010 20:19:43 -0800 (PST)
Local: Wed, Nov 24 2010 11:19 pm
Subject: multi pipelines
hi,
how supposed to be defined two pipelines if each one of them doing
completely different SQL queries?
Two classes in the same pipelines.py file?
I've tried:

# Both, ScanFirst and ScanSecond, are SQLAlchemy mappings to exsiting
DB tables.
from .tables.scanfirst import ScanFirst
from .tables.scansecond import ScanSecond
from test.items import FirstItem
from test.items import SecondItem

class FirstPipeline(object):
    def process_item(self, item, spider):
        scan_res = ScanFirst( ... )

class SecondPipeline(object):
    def process_item(self, item, spider):
        scan_res = ScanSecond( ... )

When later on, spider with SecondPipeline activated from:
class FirstSpider(CrawlSpider):
    settings.overrides['ITEM_PIPELINES'] =
['test.pipelines.SecondPipeline']
    ... (rest of spider code)

than I see that code for FirstPipeline activated, why?

10x,
--vs


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pablo Hoffman  
View profile  
 More options Nov 25 2010, 9:28 am
From: Pablo Hoffman <pablohoff...@gmail.com>
Date: Thu, 25 Nov 2010 12:28:56 -0200
Local: Thurs, Nov 25 2010 9:28 am
Subject: Re: multi pipelines
Hi vitsin,

You can't override settings like this in your spiders like your code does:

    class FirstSpider(CrawlSpider):
        settings.overrides['ITEM_PIPELINES'] = ...

And you can't customize the item pipelines per spider.

What you could do is check the spider in the process_item() of your pipeline,
and ignore certain ones. For example:

    def process_item(self, item, spider):
        if spider.name not in ['myspider1', 'myspider2', 'myspider3']:
            return item

Hope this helps,
Pablo.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
vitsin  
View profile  
 More options Nov 25 2010, 10:14 am
From: vitsin <vitaly.sinit...@gmail.com>
Date: Thu, 25 Nov 2010 07:14:12 -0800 (PST)
Local: Thurs, Nov 25 2010 10:14 am
Subject: Re: multi pipelines
hi,
are you planning may be to add support for custom pipeline per spider?
10x,
--vs

On Nov 25, 9:28 am, Pablo Hoffman <pablohoff...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pablo Hoffman  
View profile  
 More options Nov 25 2010, 10:32 am
From: Pablo Hoffman <pablohoff...@gmail.com>
Date: Thu, 25 Nov 2010 13:32:17 -0200
Local: Thurs, Nov 25 2010 10:32 am
Subject: Re: multi pipelines
Not for the moment.

But there are some nice alternatives for achieving that functionality. For
example, you can choose a spider attribute to define which pipelines will be
enabled for each spider, and then check that attribute in your pipelines.

Here's how your spiders would look:

    class SomeSpider(CrawlSpider):
        pipelines = ['first']

    class AnotherSpider(CrawlSpider):
        pipelines = ['first', 'second']

And your pipelines:

    class FirstPipeline(object):
       def process_item(self, item, spider):
            if 'first' not in getattr(spider, 'pipelines', []):
                return item

            # ... pipeline code here ...

    class SecondPipeline(object):
       def process_item(self, item, spider):
            if 'second' not in getattr(spider, 'pipelines', []):
                return item

            # ... pipeline code here ...

Btw, this code can be easily made more performant by using sets instead of
lines for the pipelines attribute, and by caching the pipelines per spider.

Pablo.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
vitsin  
View profile  
 More options Nov 25 2010, 10:28 am
From: vitsin <vitaly.sinit...@gmail.com>
Date: Thu, 25 Nov 2010 07:28:35 -0800 (PST)
Local: Thurs, Nov 25 2010 10:28 am
Subject: Re: multi pipelines
And thank you, your tip indeed helped me.

On Nov 25, 9:28 am, Pablo Hoffman <pablohoff...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
vitsin  
View profile  
 More options Nov 25 2010, 10:29 am
From: vitsin <vitaly.sinit...@gmail.com>
Date: Thu, 25 Nov 2010 07:29:38 -0800 (PST)
Local: Thurs, Nov 25 2010 10:29 am
Subject: Re: multi pipelines
And thank you, your tip indeed helped me.

On Nov 25, 9:28 am, Pablo Hoffman <pablohoff...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
vitsin  
View profile  
 More options Nov 25 2010, 10:35 am
From: vitsin <vitaly.sinit...@gmail.com>
Date: Thu, 25 Nov 2010 07:35:33 -0800 (PST)
Local: Thurs, Nov 25 2010 10:35 am
Subject: Re: multi pipelines
So the same excluding check, as you did in  process_item(), should be
done also for open_spider() and close_spider() or its enough to have
it in process_item()?

On Nov 25, 10:32 am, Pablo Hoffman <pablohoff...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »