Rhino ETL - Row objects rather than strongly typed data objects

162 views
Skip to first unread message

Helen Emerson

unread,
Sep 27, 2011, 10:27:57 AM9/27/11
to Rhino Tools Dev
We're just starting to use RhinoETL to do some data analysis.

Some of the team were wondering why all operations have to take in a
Row and return a Row rather than doing something with generics so it
would be possible to pass data objects through the system.

Was it just a matter of keeping the interfaces simple? Was it a
performance thing so there's not so much time spent marshaling values
between objects? Does it just lend itself to writing data
manipulations that simple?

I think our main problem is that it feels a bit strange to be using
data structures so similar to data sets and data tables after moving
away from that in our web application code. Does anyone know why it's
designed this way?

Jason Meckley

unread,
Sep 28, 2011, 9:03:46 AM9/28/11
to rhino-t...@googlegroups.com
I will start with your last question first and move backwards.

Objects are a great way to express the behavior of a process, but they are not a cure all. ETL (in general, not just Rhino) is the idea of Extracting data from the source, Transforming the source and Loading the transformed result into the destination. It's designed to work with extremely large result sets. It makes sense that the core data structure closely resembles tabular data. It's designed to work with tabular data, not object graphs. Rhino.ETL is an alternative to MS SQL Server SSIS (previously DTS).

The choice for a Row object is because prior to .Net 4's dynamic object a dictionary/hashtable was the only way to capture dynamic data. Row is unique in that it will not throw if a member doesn't exist. it will just create that member and return null. Row (Rhino.ETL for that matter) is designed to work with the [link http://boo.codehaus.org/]Boo[/link] scripting language.

with boo you could can do
row = new Row
row.Number = 1
row.Text = "a"

the .Net equivalent is
var row = new Row();
row["Number"] = 1;
row["Text"] = "a";

There are convenience methods to convert a row to an object and vice versa if you want to work with strongly typed objects.
Row.FromObject(object)
Row.FromReader(IDataReader)
row.ToObject<T>();

Helen Emerson

unread,
Sep 29, 2011, 9:38:40 AM9/29/11
to Rhino Tools Dev
Thankyou very much. I think we're going to try to stick with rows all
the way through our pipeline. I think the real problem was how to make
our tests readable and when we started focusing on that, using lists
of rows rather than lists of objects wasn't really the problem.

Helen

On Sep 28, 2:03 pm, Jason Meckley <jasonmeck...@gmail.com> wrote:
> I will start with your last question first and move backwards.
>
> Objects are a great way to express the behavior of a process, but they are
> not a cure all. ETL (in general, not just Rhino) is the idea of *E*xtracting
> data from the source, *T*ransforming the source and *L*oading the
> transformed result into the destination. It's designed to work with
> extremely large result sets. It makes sense that the core data structure
> closely resembles tabular data. It's designed to work with tabular data, not
> object graphs. Rhino.ETL is an alternative to MS SQL Server SSIS (previously
> DTS).
>
> The choice for a Row object is because prior to .Net 4's dynamic object a
> dictionary/hashtable was the only way to capture dynamic data. Row is unique
> in that it will not throw if a member doesn't exist. it will just create
> that member and return null. Row (Rhino.ETL for that matter) is designed to
> work with the [linkhttp://boo.codehaus.org/]Boo[/link] scripting language.
Reply all
Reply to author
Forward
0 new messages