ETL: Branches and Joins working together, oh my...

26 views
Skip to first unread message

G. Richard Bellamy

unread,
Jan 7, 2010, 8:07:57 PM1/7/10
to rhino-t...@googlegroups.com

It’s clear to me how I would Join and then Branch the results, but since a Join is considered a root operation, what is the recommended method for Branching and then Joining?

 

I understand this:

 

Table1 -\                /- Output1

        - Join - Branch -

Table2 -/                \- Output2

 

What I'm having a hard time with is this:

 

                    Table2 -\

                             - Join1 -\

                 /- Split1 -/          \

Table1 - Branch -                       - Join3 - Output

                 \- Split2 -\          /

                             - Join2 -/

                    Table3 -/

 

Does that make sense?

 

-rb

 

Simone Busoli

unread,
Jan 7, 2010, 8:13:54 PM1/7/10
to rhino-t...@googlegroups.com
Congrats for the drawing ;) Anyway, off the top of my mind I don't think you can join after branching.

--
You received this message because you are subscribed to the Google Groups "Rhino Tools Dev" group.
To post to this group, send email to rhino-t...@googlegroups.com.
To unsubscribe from this group, send email to rhino-tools-d...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rhino-tools-dev?hl=en.


G. Richard Bellamy

unread,
Jan 7, 2010, 8:14:59 PM1/7/10
to rhino-t...@googlegroups.com

Okay, so I see how I can do it now… looking at the ComplexUsersToPeopleJoinProcess, I see how I could implement something similar to the method used with GenericEnumerableOperation.

G. Richard Bellamy

unread,
Jan 7, 2010, 8:17:20 PM1/7/10
to rhino-t...@googlegroups.com

Still, that’s not really using Branches, is it?

Simone Busoli

unread,
Jan 7, 2010, 8:19:22 PM1/7/10
to rhino-t...@googlegroups.com
Exactly. In your case, if Split 1 and Split2 come from the same db table, instead of selecting once then branching you'll select twice from the table with two instances of the same input operation and go on from there.

G. Richard Bellamy

unread,
Jan 7, 2010, 9:47:17 PM1/7/10
to rhino-t...@googlegroups.com

So, I thought I’d share, and see if anyone has any pointers/critiques/comments. The ToXXXX Operations insert data, and return the input row and the “last_insert_id”.

 

protected override void Initialize()

{

  var cachedResults = new CacheResults();

  Register(new FromProductTable(Options.SourceConnectionStringName)).

  Register(new ToPodProductTable(Options.TargetConnectionStringName)).

  Register(new ToPodTable(Options.TargetConnectionStringName, "product")).

  Register(new CacheResults());

  Partial.

    Register(new JoinLookupsAndProducts().

      Left(Partial.

        Register(new ManufacturerFromProductTable(Options.SourceConnectionStringName)).

        Register(new ToPodLookupTable(Options.TargetConnectionStringName, "product_manufacturer")).

        Register(new ToPodTable(Options.TargetConnectionStringName, "product_manufacturer"))).

      Right(cachedResults)).

    Register(new ToProductPodRelationshipTable(Options.TargetConnectionStringName));

  Partial.

    Register(new JoinLookupsAndProducts().

      Left(Partial.

        Register(new CertificationCategoryFromProductTable(Options.SourceConnectionStringName)).

        Register(new ToPodLookupTable(Options.TargetConnectionStringName, "product_certification_category")).

        Register(new ToPodTable(Options.TargetConnectionStringName, "product_certification_category"))).

      Right(cachedResults)).

    Register(new ToProductPodRelationshipTable(Options.TargetConnectionStringName));

  Partial.

    Register(new JoinLookupsAndProducts().

      Left(Partial.

        Register(new RegionFromProductTable(Options.SourceConnectionStringName)).

        Register(new ToPodLookupTable(Options.TargetConnectionStringName, "product_region")).

        Register(new ToPodTable(Options.TargetConnectionStringName, "product_region"))).

      Right(cachedResults)).

    Register(new ToProductPodRelationshipTable(Options.TargetConnectionStringName));

  Partial.

    Register(new JoinLookupsAndProducts().

      Left(Partial.

        Register(new TypeFromProductTable(Options.SourceConnectionStringName)).

        Register(new ToPodLookupTable(Options.TargetConnectionStringName, "product_type")).

        Register(new ToPodTable(Options.TargetConnectionStringName, "product_type"))).

      Right(cachedResults)).

    Register(new ToProductPodRelationshipTable(Options.TargetConnectionStringName));

}

webpaul

unread,
Jan 7, 2010, 11:42:51 PM1/7/10
to Rhino Tools Dev
You are doing 2 new CacheResults() calls

You may want to consider making the ToPodLookup/ToPodTable style
double lines into a reusable operation so you don't have to double up
on those lines each time

On Jan 7, 8:47 pm, "G. Richard Bellamy" <rbell...@pteradigm.com>
wrote:

> }- Hide quoted text -
>
> - Show quoted text -

Simone Busoli

unread,
Jan 8, 2010, 2:24:55 AM1/8/10
to rhino-t...@googlegroups.com
Mhh, it this code working? I think I understand what you are trying to do, but I see that the partial processes you create aren't being attached to the main process anywhere, so afaics they shouldn't be executed.
Another point is that you are creating a new CacheResults on the 4th call to register. Isn't that supposed to be the same one you instantiate on the first line? I might be wrong though.

Anyway, I'm not sure you need that CacheResults operation. I think you can just instantiate the operations and the executor will take care of caching the results of the operation, but you suggest you try it out, it might depend on the pipeline executor you are using.

G. Richard Bellamy

unread,
Jan 8, 2010, 2:36:08 AM1/8/10
to rhino-t...@googlegroups.com
Thanks for catching the double call to CacheResults().

Nice suggestion regarding merging the two others, I'll look at that...

Reply all
Reply to author
Forward
0 new messages