ETL Process

94 views
Skip to first unread message

John Paraskevopoulos

unread,
May 21, 2014, 8:38:09 AM5/21/14
to rhino-t...@googlegroups.com
Hi,

I am trying to do a complex process. My Initialize code is shown below:

            Register    //JoinVillas is a FullOuterJoin that adds an enumerator depending on whether the file's record is new or updated
                (
                    new JoinVillas()
                    .Left(new DBVillas())
                    .Right(new NSProducts(Path.Combine(@".\Data\products.xml")))
                );
            Register
                (
                    new BranchingOperation()
                    .Add  //First i want to filter all the new records in order to add them in the db
                    (
                        Partial
                        .Register(new FilterAddedVillas()) //This filters the data in the xml and selects only new records and transforms them to db format
                        .Register(new InsertVillas()) // This is an SQLBulkInsertOperation
                    )
                    .Add //Then for all the records (no matter if they are new or updated i want to process three detail tables
                    (
                        Partial
                        .Register(new ProcessVillas()) // This assumes that the previous branch has run and gives the xml records the id of the db
                        .Register
                            (
                                new BranchingOperation()
                                .Add
                                (
                                    Partial
                                    .Register
                                    (
                                        new BranchingOperation()
                                        .Add
                                        (
                                            new ConventionOutputCommandOperation("Connection")
                                            {
                                                Command = "DELETE FROM DIARY WHERE PropId = @PropId"
                                            }
                                        )
                                        .Add
                                        (
                                            Partial
                                            .Register(new TransformToDiary())
                                            .Register(new InsertDiary()) // This is an SQLBulkInsertOperation
                                        )
                                    )       
                                )
                                .Add
                                .
                                . //Some more detail processing here
                            )
                       
                    )
                );

Though i have added some comments in the above code i would like to explain what i am trying to achieve:

I have a huge xml file (800MB) that i need to pass to the db. There are going to be new records and also records to be updated (i would also like to mark as deleted records that do no longer exist in the xml)
So what  am doing is having a FullOuterJoin JoinOperation that gets all the data from the xml clones the row and adds the recordid in the db and a another column that specifies if it is a new, existing or no longer existing record.

In the next step i have a BranchingOperation that first i want to filter the new records and the Bulk insert them in the db. The next branch of the operation would fetch the auto id field from the db and add it in the row so i can process the detail records in the xml.

Last step is another branching operation for the detail records to transform them into the db format and then bulk insert them.

This whole process doesn't seem to be quite working. Am i correct in the assumption that it will work in the way i am expecting it to work?

Can you please provide some assistance?

Giannis
Reply all
Reply to author
Forward
0 new messages