RHINO-ETL - Multiple Sql inputs (revised)

204 views
Skip to first unread message

John Strzempa

unread,
Jun 17, 2012, 11:15:58 AM6/17/12
to rhino-t...@googlegroups.com
So I asked this earlier but wanted to provide more detail in the hopes that I can find the correct way to accomplish this.
I do a select statement on the source database but as I run through the process there are other tables in the source database that I want to incorporate into the rows so I can use their values.
For example, I grab a product record from the source but then I want to query the destination table to see if that product already exists and I join them on SKU. The problem with just a simple join is the SKU from the source has to be formatted for use in the destination database.

So, I'm setting it up like below and wondering if this is the best way to accomplish this task:

Register(new GetProducts("DataSource1"));
Register(new FormatSku());
Register(new GetDestProduct());

Inside the GetDestProduct

                string sku = ((string)row["SKU"] ?? "").Trim();

                int recordCount = Use.Transaction("DataSource2", delegate(IDbCommand cmd)
                {
                    cmd.CommandText = "SELECT ProductId from Products where Sku = @Sku";
                    cmd.Parameters.Add(sku);
                    return (int) cmd.ExecuteScalar();
                });

                if (recordCount > 0)
                {
                    row["ProductAction"] = "update";
                    row["ProductLocation"] = "exists";

                    Use.Transaction("DataSource2", delegate(IDbCommand cmd)
                    {
                        cmd.CommandText = "SELECT ProductId, Name, Summary, Description, ExtendedDescription, ThumbnailUrl, ImageUrl, InventoryModeId"
                                        + "FROM Products WHERE LTrim(RTrim(Sku)) = @Sku";

                        using (IDataReader reader = cmd.ExecuteReader())
                            while (reader.Read())
                            {
                                row["PROD_ID"] = reader.GetInt32(0);
                                row["DESC1"] = reader.GetString(1);
                            }
                    });
                }


Another scenario may present itself where I want a record from a table that has nothing to do with the products table. Would I just add that row to the row object in a similar manner as above?
When I asked the question earlier it was said that joins would accomplish this but all the join operations I've seen require a way to join the two inputs and in this case, there not be any.

Jason Meckley

unread,
Jun 18, 2012, 8:07:10 AM6/18/12
to rhino-t...@googlegroups.com
What you currently have is a select N+1 scenario. Rather load all the rows you want from source and all the products you want from the destination database and join them in memory.

Register(new MyJoinOperatoin()
         .Left(Partial.Register(new GetProducts("DataSource1")).new FormatSku()))
         .Right(new GetDestProducts())

MyJoinOperation is an implementation of JoinOperation. I would assume you would want to left outer join so the implementation would look something like this

MyJoinOperation
{
   protected Row Join(Row left, Row right)
   {
       var clone = left.Clone();
       if(right.Contains("Sku"))
       {
              // sku exists in destination.
              clone["action"] = "update";
             ...
       }
       else
       {
             // sku does not exist in destination
              clone["action"] = "add"; 
            ...
       }
       return clone;
   }

   protected void SetupJoin()
   {
      LeftJoin.Left("Sku").Right("Sku");
Reply all
Reply to author
Forward
0 new messages