I am playing the framework and I am trying to understand a bit the SCD part of it.
My plan is to export from a mysql to a File. Then do the transform from a File to another File , and finally write to a postgresql (using bulk upload).
1) Does Bulk upload manages SCD? or this is not even an option in this case?
2) But I see duplicate everytime I run it, is there something I need to do to prevent duplicate?
3) when I specificy the following directive, I see nothing outputed in the file nor in the database ever (even after truncating the final destination and rerunning it).... destination :out, { :file => "../tmp/t_board_accounts.csv", :append => false, :natural_key => [ :id ], :scd_fields => [ :crawling_enabled, :max_candidates_per_day], :scd => { :dimension_target => :datawarehouse, :dimension_table => "board_accounts", :type => 2 }
The code here is interesting because it assumes that instead of doing an update and insert, to make good use of bulkupload, it does a delete and create 2 new rows :)
one for the old record and marked as expired, and one for the updated version.
This means that you get 1 delete and 2 insert. So the consequence is it needs table where you can write your own SK otherwise that will break . Also I hope AR can deal with this correctly because I recall AR not being a happy camper when setting an id yourself...
> The code here is interesting because it assumes that instead of doing an > update and insert, to make good use of bulkupload, it does a delete and > create 2 new rows :)
> one for the old record and marked as expired, and one for the updated > version.
> This means that you get 1 delete and 2 insert. So the consequence is it > needs table where you can write your own SK otherwise that will break . > Also I hope AR can deal with this correctly because I recall AR not being a > happy camper when setting an id yourself...
> More to come as I debug it
> On Monday, October 8, 2012 11:47:44 AM UTC-7, Thibaut Barrère wrote:
Ok , I found what is not working and why I could not see it working fully
The old row gets added with the incorrect SK because I have a virtual key setup.. and it uses the max(id) from my table. Something about that is not working correctly when generating the rows and virtual key are used.
Not sure yet how to fix it but I can reproduce this issue now
On Thursday, October 25, 2012 3:19:51 PM UTC-7, Emmanuel wrote:
> So it does work with bulk upload and got it to fully work on a small > example. Just gotta watch out for the fact that you get a delete , then 2 > insert
> Emmanuel
> On Thursday, October 25, 2012 3:03:07 PM UTC-7, Emmanuel wrote:
>> The code here is interesting because it assumes that instead of doing an >> update and insert, to make good use of bulkupload, it does a delete and >> create 2 new rows :)
>> one for the old record and marked as expired, and one for the updated >> version.
>> This means that you get 1 delete and 2 insert. So the consequence is it >> needs table where you can write your own SK otherwise that will break . >> Also I hope AR can deal with this correctly because I recall AR not being a >> happy camper when setting an id yourself...
>> More to come as I debug it
>> On Monday, October 8, 2012 11:47:44 AM UTC-7, Thibaut Barrère wrote:
I got it to fully work correctly but there seems to be a specific naming that has to be correct between the way you specify the virtual field using the SurrogateKeyGenerator and the way you save it. Then the ID comes back correct and work with bulkload :)
Bottom line it works and allows me now to use the bulk load .
On Thursday, October 25, 2012 3:27:00 PM UTC-7, Emmanuel wrote:
> Ok , I found what is not working and why I could not see it working fully
> The old row gets added with the incorrect SK because I have a virtual key > setup.. and it uses the max(id) from my table. Something about that is not > working correctly when generating the rows and virtual key are used.
> Not sure yet how to fix it but I can reproduce this issue now
> Emmanuel
> On Thursday, October 25, 2012 3:19:51 PM UTC-7, Emmanuel wrote:
>> So it does work with bulk upload and got it to fully work on a small >> example. Just gotta watch out for the fact that you get a delete , then 2 >> insert
>> Emmanuel
>> On Thursday, October 25, 2012 3:03:07 PM UTC-7, Emmanuel wrote:
>>> The code here is interesting because it assumes that instead of doing an >>> update and insert, to make good use of bulkupload, it does a delete and >>> create 2 new rows :)
>>> one for the old record and marked as expired, and one for the updated >>> version.
>>> This means that you get 1 delete and 2 insert. So the consequence is it >>> needs table where you can write your own SK otherwise that will break . >>> Also I hope AR can deal with this correctly because I recall AR not being a >>> happy camper when setting an id yourself...
>>> More to come as I debug it
>>> On Monday, October 8, 2012 11:47:44 AM UTC-7, Thibaut Barrère wrote:
More things to watch out for. If you put a field from the input that is not coming from the database output. Also boolean fields.. Between mysql to postgre it looks different and caused constant changes.
The one downside of the current implementation is that it delete all the records and read them the same way if unchanged . So during the transform lap till load , you have no records or missing records . Just a gotcha to watch out for
Normally it should be an update and insert which then ensure that no record are ever missing though I understand it is slower because we cannot use the bulk load
On Thursday, October 25, 2012 3:38:43 PM UTC-7, Emmanuel wrote:
> I got it to fully work correctly but there seems to be a specific naming > that has to be correct between the way you specify the virtual field using > the SurrogateKeyGenerator and the way you save it. Then the ID comes back > correct and work with bulkload :)
> Bottom line it works and allows me now to use the bulk load .
> Emmanuel
> On Thursday, October 25, 2012 3:27:00 PM UTC-7, Emmanuel wrote:
>> Ok , I found what is not working and why I could not see it working fully
>> The old row gets added with the incorrect SK because I have a virtual key >> setup.. and it uses the max(id) from my table. Something about that is not >> working correctly when generating the rows and virtual key are used.
>> Not sure yet how to fix it but I can reproduce this issue now
>> Emmanuel
>> On Thursday, October 25, 2012 3:19:51 PM UTC-7, Emmanuel wrote:
>>> So it does work with bulk upload and got it to fully work on a small >>> example. Just gotta watch out for the fact that you get a delete , then 2 >>> insert
>>> Emmanuel
>>> On Thursday, October 25, 2012 3:03:07 PM UTC-7, Emmanuel wrote:
>>>> The code here is interesting because it assumes that instead of doing >>>> an update and insert, to make good use of bulkupload, it does a delete and >>>> create 2 new rows :)
>>>> one for the old record and marked as expired, and one for the updated >>>> version.
>>>> This means that you get 1 delete and 2 insert. So the consequence is it >>>> needs table where you can write your own SK otherwise that will break . >>>> Also I hope AR can deal with this correctly because I recall AR not being a >>>> happy camper when setting an id yourself...
>>>> More to come as I debug it
>>>> On Monday, October 8, 2012 11:47:44 AM UTC-7, Thibaut Barrère wrote: