DPF System - IBM Power 780 Server Model 9179-MHB
The Number 1 Ranking benchmark for TPC with DB2 9.7 uses a DPF system with
96 partitions across 3 physical servers.
There are 9 tables in the benchmark and 8 of the tables have a common column
(warehouse code) that is the hash key to avoid cross partition joins. The
9th table (ITEM) does not have warehouse code, so is created as MQT to
distribute (replicate) them to each node. So far this sounds normal DPF
design.
But is there some code at the application level that routes transactions to
the correct physical server (there are 3) based on the hash value of the
warehouse code done at the application side? Or do all the transactions go
to the same coordinator node on one of the machines (as would normally be
the case in a data warehouse). I am assuming it is the former, but I would
like to understand how that is done.
Single Server System - IBM Power 595 Server Model 9119-FHA
The number 3 Ranking Benchmark for TPC-C is also DB2, but uses a single
server without DPF. However all tables but ITEM were horizontally
"partitioned" into multiple tables (not using Table Partitioning or anything
like that, but just by creating multiple tables). For example there are 384
Customer tables (CUSTOMER1 to CUSTOMER384).
Each table partition for STOCK, CUSTOMER, ORDERS and ORDERLINE contains data
associated with a range of 1,350 warehouses.
Each table partition for WAREHOUSE, DISTRICT, NEWORDER and HISTORY contains
data associated with a range of 8,100 warehouses.
For each "partitioned" table, a view was created over all table partitions
to provide full transparency of data manipulation. No tables were
replicated.
Again the question I have is how does the application know which table a
particular customer goes in (CUSTOMER1 to CUSTOMER384). There is a UNION ALL
view of these tables to retrieve them, but there is no range predicate on
them. So the application code must calculate the table number suffix, and I
was wondering how they do that.
When you look at the non-clustered full isclosure report you find these
clauses:
INDEX IN ts_ware_062
ORGANIZE BY KEY SEQUENCE (
W_ID STARTING FROM 494101 ENDING AT 502200
The key sequences presents a partitioning (of course).
DB2 uses them in the same way it would use a check constraint to derive
the partition information.
As foe the clustered result I'm not familiar with the client side part
of the TPC-C application but there is a sentence in the full disclosure
overview talking about:
5.8.1. Transaction Flow
...
COM+ routes the transaction and balances the load according to the
options defined in the Component Services GUI
for the COM+ server application and settings in the Windows 2008
Registry. The configuration file and registry
variables are listed in Appendix B.2.At the beginning, each TPC-C user
sends a pair of HTML 1.0 requests submitting its unique warehouse and
district
to the IIS ISAPI handler. Upon successful validation of user's login,
IIS the displays an HTML form which
encapsulates the TPC-C transaction menu.
...
--
Serge Rielau
SQL Architect DB2 for LUW, IBM Toronto Lab
Blog: tinyurl.com/SQLTips4DB2
Wiki: tinyurl.com/Oracle2DB2Wiki
Twitter: srielau