Add Single Row to Beginning of pipeline.

23 views
Skip to first unread message

stellard

unread,
Oct 10, 2012, 8:37:37 AM10/10/12
to activewareh...@googlegroups.com
Is there any way to insert a single row into the beginning of the pipeline?

For example. I have a parent child hierarchy and before I load it into the Datawarehouse I need to create a single root parent.

I was thinking it could be done with a processor (with some kind of flag to only insert it the first time) but this is very inefficient as it would run against all rows. Is there any other way?

Thibaut Barrère

unread,
Oct 10, 2012, 8:48:44 AM10/10/12
to activewareh...@googlegroups.com
Hello!

Is there any way to insert a single row into the beginning of the pipeline?
For example. I have a parent child hierarchy and before I load it into the Datawarehouse I need to create a single root parent.
I was thinking it could be done with a processor (with some kind of flag to only insert it the first time) but this is very inefficient as it would run against all rows. Is there any other way?

I usually add an "enumerable" source before the first source to create such records. A classic example is creating an "unknown" record as the first row of a dimension.

Here is an example:

default_record = { field: value, ... }

source :default, { type: :enumerable, enumerable: [default_record] }
source :your_true_source

Your default record will have to go through the whole pipeline in that case, so you may want to fill correct default values.

I usually rely on the following trick when applicable:

default_record = Hash[*required_fields.zip(['unknown']*(required_fields.size)).flatten]

This is just one possibility, you could also create the record in a preprocessor manually using ActiveRecord or similar, or in another control file.

Let me know if it helps, or not!

PS: I like to make such ETL processes "idempotent", ie: able to be re-run without creating issues. Such a script would automatically upsert the root record and following records, so you can run the whole thing without thinking too much if you are bootstrapping or not.

Thibaut
--

stellard

unread,
Oct 10, 2012, 10:31:05 AM10/10/12
to activewareh...@googlegroups.com
This is perfect. I did not realise you could have multiple sources per control file. Thanks again Thibaut. 

Thibaut Barrère

unread,
Oct 10, 2012, 3:51:21 PM10/10/12
to activewareh...@googlegroups.com
> This is perfect. I did not realise you could have multiple sources per control file. Thanks again Thibaut. 

You welcome :)

Thibaut
--
Reply all
Reply to author
Forward
0 new messages