On Thursday, 8 October 2015 22:20:02 UTC+13,
vvman...@gmail.com wrote:
> I would like to know that is there any way to get a good performance for updates in DB2 Blu enabled tables. The application is OLAP and using DataStage (ETL tool) and Sql Replication to load the data into tables.
>
> Thanks In Advance
Hi, since nobody else has answered yet, I'll offer some suggestions based on published theory and normal batch good practice. Sadly I cannot yet claim any real-world project experience with BLU, and strongly suggest that you some experimentation to verify what actually works best.
I'm also unsure from your question whether you meant UPDATEs in the SQL command sense or the more broad sense of inserts, updates and deletes.
There is an IBM paper available called "DB2 with BLU Acceleration:
So Much More than Just a Column Store". It has a lot of detail about BLU internals and I suggest you read that if you haven't already.
We know then that BLU is optimized for good insert performance. An insert buffer (separate to bufferpool) is used to reduce logging and bufferpool latching costs by buffering individual inserts so that they can be applied in a single operation once the buffer fills or the application commits. This is why the manuals recommend that you perform at least 100 inserts before committing. I think the paper suggests perhaps 1000. Regardless it is obvious that for maximum efficiency you need to fill the buffer, so larger commit intervals are more efficient than small ones just as with a regular row-organized batch (but even more so).
Deletes are logical deletes affected by flipping a flag against the row (in an internal TupleState column). The paper states that multiple deletes on the same page from a single transaction are logged in a single record. So again, larger commit intervals will help performance, but additionally, if you can somehow process updates in a way that you hit the same page multiple times within the same transaction, in theory that will help. In practice that may not be easy but if, for example, you know your data is inserted in sales date order then performing deletes batched by sales date should (combined with a large commit interval) be more efficient and hence faster.
Updates are performed internally as a delete insert, so a similar approach should apply for best performance.
Hope this is some help.
Jeremy