Rhino.Etl: Very Confused by SingleThreadedPipelineExecuter vs ThreadPoolPipelineExecuter

248 views
Skip to first unread message

G. Richard Bellamy

unread,
Dec 13, 2009, 1:41:34 PM12/13/09
to rhino-t...@googlegroups.com

Okay, so I’ve been struggling with trying to troubleshoot an issue with an operation dying and not seeming to complete using the ThreadPoolPipelineExecuter. I have a feeling I’m very wrong in some of my fundamental assumptions.

 

I was pointed at the SingleThreadedPipelineExecuter to help me track down my problem. During that discovery process, I have found that the CachingEnumerable and the ThreadSafeEnumerator are initialized very differently, and as a result this causes the pipeline to behave differently between the two. The difference is in DecorateEnumerableForExecution – the SingleThreadedPipelineExecuter doesn’t enumerate prior to initialization of the CachingEnumerable, while the ThreadPoolPipelineExecuter does enumerate prior to the initialization of the ThreadSafeEnumerator.

 

The consequence of the above is that the following test fails for SingleThreadedPipelineExecuter with firstAccumulator having a count of zero, and for the ThreadPoolPipelineExecuter the test fails because it is only iterated once (has a count of 1). I’m hoping those smarter and more familiar with Rhino.Etl can give me some guidance on what I should see as expected behavior, or what a fix would look like – if it requires a fix, I’m happy to submit a patch, but I’m having a hard time figuring out where to start.

 

namespace Rhino.Etl.Tests

{

    [TestFixture]

    public class SingleThreadedPipelineExecuterTest

    {

        //stuff here

 

        [Test]

        public void AllOperationsShouldBeEnumerated()

        {

            var firstAccumulator = new ArrayList();

            var secondAccumulator = new ArrayList();

 

            using (var process = MockRepository.GenerateStub<EtlProcess>())

            {

                process.Stub(x => x.TranslateRows(null)).IgnoreArguments().WhenCalled(

                    x => x.ReturnValue = x.Arguments[0]);

 

                /* UNCOMMENT ONE OR THE OTHER TO CHECK THINGS OUT */

                //process.PipelineExecuter = new SingleThreadedPipelineExecuter();

                //process.PipelineExecuter = new ThreadPoolPipelineExecuter();

 

                process.Register(new GenericEnumerableOperation(new[] { Row.FromObject(new { First = "First" }) }));

                process.Register(new OutputSpyOperation(2, r => firstAccumulator.Add(r["First"]))); // does not enumerate

                process.Register(new GenericEnumerableOperation(new[] { Row.FromObject(new { Second = "Second" }) }));

                process.Register(new OutputSpyOperation(2, r => secondAccumulator.Add(r["Second"])));

 

                process.Execute();

            }

 

            CollectionAssert.AreElementsEqual(firstAccumulator, Enumerable.Repeat("First", 2)); // fails here

            CollectionAssert.AreElementsEqual(secondAccumulator, Enumerable.Repeat("Second", 2));

        }

 

        //stuff here

 

    }

}

 

Thanks in advance for any guidance you may have.

-rb

webpaul

unread,
Dec 14, 2009, 5:01:33 PM12/14/09
to Rhino Tools Dev
Can you post the source of the two operations you are using here? You
also may want to check Register vs. RegisterLast to make sure these
are being processed in the order you think they are.

On Dec 13, 12:41 pm, "G. Richard Bellamy" <rbell...@pteradigm.com>
wrote:
> Okay, so I've been struggling with trying to troubleshoot an issue with an
> operation dying and not seeming to complete using the
> ThreadPoolPipelineExecuter. I have a feeling I'm very wrong in some of my
> fundamental assumptions.
>
> I was pointed at the SingleThreadedPipelineExecuter to help me track down my
> problem. During that discovery process, I have found that the
> CachingEnumerable and the ThreadSafeEnumerator are initialized very
> differently, and as a result this causes the pipeline to behave differently
> between the two. The difference is in DecorateEnumerableForExecution - the
> SingleThreadedPipelineExecuter doesn't enumerate prior to initialization of
> the CachingEnumerable, while the ThreadPoolPipelineExecuter does enumerate
> prior to the initialization of the ThreadSafeEnumerator.
>
> The consequence of the above is that the following test fails for
> SingleThreadedPipelineExecuter with firstAccumulator having a count of zero,
> and for the ThreadPoolPipelineExecuter the test fails because it is only
> iterated once (has a count of 1). I'm hoping those smarter and more familiar
> with Rhino.Etl can give me some guidance on what I should see as expected
> behavior, or what a fix would look like - if it requires a fix, I'm happy to

G. Richard Bellamy

unread,
Dec 15, 2009, 9:12:38 AM12/15/09
to G. Richard Bellamy, rhino-t...@googlegroups.com

As a matter of fact, that’s why I included the Test in my post… if you take that test and drop it in the SingleThreadedPipelineExecuterTest.cs, you’ll be able to see what I’m talking about.

 

In other words, it’s a failing test that should pass, in my opinion. Not only that, it fails in two different ways, depending on the threading model.

 

Thanks for your help!

Richard

webpaul

unread,
Dec 15, 2009, 12:01:51 PM12/15/09
to Rhino Tools Dev
Is GenericEnumerableOperation and OutputSpyOperation yours? If so,
post the source for those - I'm just trying to rule out a simple
error in the test before digging into Rhino ETL source.

On Dec 15, 8:12 am, "G. Richard Bellamy" <rbell...@pteradigm.com>
wrote:
> As a matter of fact, that's why I included the Test in my post. if you take

Simone Busoli

unread,
Dec 15, 2009, 3:42:10 PM12/15/09
to rhino-t...@googlegroups.com
Hi Richard,

The first problem I spot in your test is that your registering an output operation which doesn't yield any rows (the second call to Register) in the middle of the pipeline.

Let's tackle the single threaded case first, calling the registered operations, for brevity, a, b, c, d, respectively

1 - The pipeline executor starts pulling from the tail of the pipeline, thus asking d to yield its first result.
2 - d will start running and it will enter the for, then the foreach and will start iterating the rows argument, which is fed by c.
3 - c executes and since its Execute method body is not an iterator block it will return the array of rows you have constructed c with.
4 - d continues executing normally thus filling secondAccumulator, but no rows will be asked to b, let alone a.

The pipeline stops between b and c, that's why you see firstAccumulator empty.

About the multi threaded case, instead, there was a discussion some time ago where I think we ended up saying that it was correct not to be able to iterate the rows twice. That's why you see both accumulators with one row instead of two. In this case the pipeline interruption doesn't occur because the multi threaded executor iterates the operations eagerly (although the test is designed wrongly in any case), but the output is a single row instead of two.

--

You received this message because you are subscribed to the Google Groups "Rhino Tools Dev" group.
To post to this group, send email to rhino-t...@googlegroups.com.
To unsubscribe from this group, send email to rhino-tools-d...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rhino-tools-dev?hl=en.

rbellamy

unread,
Dec 16, 2009, 12:18:14 AM12/16/09
to Rhino Tools Dev
Those are both a part of the SingleThreadedExecuterTest.cs file...
which is part of the current codebase.

rbellamy

unread,
Dec 16, 2009, 12:39:40 AM12/16/09
to Rhino Tools Dev
Thank you for the detailed explanation.

#1 explains things nicely... I was under the mistaken impression that
the pipeline executor would pull from the head.

I knew I was making a fundamentally incorrect assumption.

This would mean that a pipeline, by definition, should never have more
than one output operation which doesn't yield rows? Therefore, the
process I've constructed is malformed - it should, in fact, be two.
Correct? (Oh great... just found a post from Sept that says EXACTLY
that!!!!!)

Now - what is RegisterLast for?

On Dec 15, 12:42 pm, Simone Busoli <simone.bus...@gmail.com> wrote:
> Hi Richard,
>
> The first problem I spot in your test is that your registering an output
> operation which doesn't yield any rows (the second call to Register) in the
> middle of the pipeline.
>
> Let's tackle the single threaded case first, calling the registered
> operations, for brevity, *a*, *b*, *c*, *d*, respectively
>
> 1 - The pipeline executor starts pulling from the tail of the pipeline, thus
> asking *d *to yield its first result.
> 2 - *d* will start running and it will enter the *for*, then the
> *foreach*and will start iterating the
> *rows* argument, which is fed by *c*.
> 3 - *c* executes and since its *Execute* method body is not an iterator
> block it will return the array of rows you have constructed *c* with.
> 4 - *d* continues executing normally thus filling *secondAccumulator*, but
> no rows will be asked to *b*, let alone *a*.
>
> The pipeline stops between *b* and *c*, that's why you see *firstAccumulator
> * empty.
>
> About the multi threaded case, instead, there was a discussion some time ago
> where I think we ended up saying that it was correct not to be able to
> iterate the rows twice. That's why you see both accumulators with one row
> instead of two. In this case the pipeline interruption doesn't occur because
> the multi threaded executor iterates the operations eagerly (although the
> test is designed wrongly in any case), but the output is a single row
> instead of two.
>
> > rhino-tools-d...@googlegroups.com<rhino-tools-dev%2Bunsubscribe@ googlegroups.com>
> > .

Simone Busoli

unread,
Dec 16, 2009, 1:56:23 AM12/16/09
to rhino-t...@googlegroups.com
On Wed, Dec 16, 2009 at 06:39, rbellamy <rbel...@pteradigm.com> wrote:
This would mean that a pipeline, by definition, should never have more
than one output operation which doesn't yield rows? Therefore, the
process I've constructed is malformed - it should, in fact, be two.
Correct?

Exactly, with the exception of branching operations, which let you branch the pipeline thus creating multiple branches which can each have its own output.

webpaul

unread,
Dec 17, 2009, 1:16:10 PM12/17/09
to Rhino Tools Dev
I would actually expect an operation called "OutputSpy" to not eat the
rows but to pass them through which is why I wanted to see the source
for that. There isn't any reason other than clarity to not have an
output operation pass through the rows though, as long as you yield
return them the next operation will get them.

On Dec 16, 12:56 am, Simone Busoli <simone.bus...@gmail.com> wrote:

G. Richard Bellamy

unread,
Dec 17, 2009, 1:21:25 PM12/17/09
to rhino-t...@googlegroups.com
That makes complete sense. I was just reusing a utility class that was
already in the test fixture.

--

You received this message because you are subscribed to the Google Groups
"Rhino Tools Dev" group.
To post to this group, send email to rhino-t...@googlegroups.com.
To unsubscribe from this group, send email to

rhino-tools-d...@googlegroups.com.

Simone Busoli

unread,
Dec 17, 2009, 3:00:22 PM12/17/09
to rhino-t...@googlegroups.com
Paul, the source has always been here, but you didn't even take the time to go over it nor to be helpful in any way, so what are you actually complaining about?

--

You received this message because you are subscribed to the Google Groups "Rhino Tools Dev" group.
To post to this group, send email to rhino-t...@googlegroups.com.
To unsubscribe from this group, send email to rhino-tools-d...@googlegroups.com.

Ayende Rahien

unread,
Dec 17, 2009, 3:44:48 PM12/17/09
to rhino-t...@googlegroups.com
Simone,
I think that Paulk had the same thought as I did, looking at the code, there is no way to know what the referenced classes are, and even though I wrote most of RETL, _I_ don't remember all the classes invovled.

Simone Busoli

unread,
Dec 17, 2009, 3:57:14 PM12/17/09
to rhino-t...@googlegroups.com
I was just pointing out that he first asked where those classes came from before even disturbing to give a look at the code, then complained about the purpose of them, in a completely unhelpful manner. And I don't like this at all, since it's not even the first time that he comes up with assumption and statements about how the code is written and performs without even taking the time to patch it.

webpaul

unread,
Dec 18, 2009, 6:32:42 AM12/18/09
to Rhino Tools Dev
Simon, I apologize for my insolence of not recognizing those test
classes weren't his.

What you were recommending is more or less how I answered him here
(http://groups.google.com/group/rhino-tools-dev/browse_thread/thread/
991ea6b2e955f3f4) though your explanation was more clear and detailed,
so congrats.

On Dec 17, 2:57 pm, Simone Busoli <simone.bus...@gmail.com> wrote:
> I was just pointing out that he first asked where those classes came from
> before even disturbing to give a look at the code, then complained about the
> purpose of them, in a completely unhelpful manner. And I don't like this at
> all, since it's not even the first time that he comes up with assumption and
> statements about how the code is written and performs without even taking
> the time to patch it.
>
>
>
> On Thu, Dec 17, 2009 at 21:44, Ayende Rahien <aye...@ayende.com> wrote:
> > Simone,
> > I think that Paulk had the same thought as I did, looking at the code,
> > there is no way to know what the referenced classes are, and even though I
> > wrote most of RETL, _I_ don't remember all the classes invovled.
>

> > On Thu, Dec 17, 2009 at 10:00 PM, Simone Busoli <simone.bus...@gmail.com>wrote:
>
> >> Paul, the source has always been here<http://github.com/ayende/rhino-etl/blob/master/Rhino.Etl.Tests/Single...>,

> >>> rhino-tools-d...@googlegroups.com<rhino-tools-dev%2Bunsubscribe@­googlegroups.com>


> >>> .
> >>> For more options, visit this group at
> >>>http://groups.google.com/group/rhino-tools-dev?hl=en.
>
> >>  --
> >> You received this message because you are subscribed to the Google Groups
> >> "Rhino Tools Dev" group.
> >> To post to this group, send email to rhino-t...@googlegroups.com.
> >> To unsubscribe from this group, send email to

> >> rhino-tools-d...@googlegroups.com<rhino-tools-dev%2Bunsubscribe@­googlegroups.com>


> >> .
> >> For more options, visit this group at
> >>http://groups.google.com/group/rhino-tools-dev?hl=en.
>
> >  --
> > You received this message because you are subscribed to the Google Groups
> > "Rhino Tools Dev" group.
> > To post to this group, send email to rhino-t...@googlegroups.com.
> > To unsubscribe from this group, send email to

> > rhino-tools-d...@googlegroups.com<rhino-tools-dev%2Bunsubscribe@­googlegroups.com>


> > .
> > For more options, visit this group at

> >http://groups.google.com/group/rhino-tools-dev?hl=en.- Hide quoted text -
>
> - Show quoted text -

Reply all
Reply to author
Forward
0 new messages