Hash cluster, parameter SIZE

abracadabuda

unread,

Nov 26, 2009, 4:32:15 AM11/26/09

to

Hi everyone.

I have a table with transactions of customers. There are about 20
transactions per month for a customer, I need to store history of 24
months. There is about 1 500 000 customers. Average length of row is
300 bytes.

The only select statements on this table would be count of
transactions, balance of transactions, etc., etc of one customer.
There are no updates, inserting and deleting is running in batch. So I
decided to use hash cluster, with hash on customer_number.

Size of one db block is 8KB. One block therefore cannot store all
records of one customer, so I tried to set parameter SIZE of the
cluster to a higher value. I calculated the space for one custmer = 20
transations * 24 months * 300 bytes = 144 000 bytes, that's 140,625
KB, so I decided to use the SIZE of 144KB.

I tried to create the cluster with mentioned SIZE (144KB), but I got
this error message:
Error at line 1
ORA-02229: invalid SIZE option value

Action, I googled for this error is:
Specify an appropriate value.

I tried to decrease the SIZE value, the highest accepted SIZE value
was 32766 bytes (it didn't even accept 32K)

Only restriction I found in Oracle documentation is, that it has to be
an integer value. There is no restriction when specifing SIZE
parameter when creating cluster:
http://www.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10759/statements_5001.htm

The only restriction I found when specifing SIZE is a note here:
http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/clauses008.htm
Note:
Not all multiples of bytes are appropriate in all cases, and context-
sensitive limitations may apply. In the latter case, Oracle issues an
error message.

Create statement for my cluster:
CREATE CLUSTER CLS$TEST_HASH
(
cust_no VARCHAR2(10)
)
TABLESPACE DATL_TEST
PCTUSED 0
PCTFREE 0
INITRANS 2
MAXTRANS 255
SIZE 144K
STORAGE (
INITIAL 256K
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
)
HASHKEYS 2000003
HASH IS mod(to_number(cust_no), 2000003)
NOROWDEPENDENCIES
NOCACHE
NOPARALLEL;

Can you please help me? I cannot find anything useful anywhere :( What
are the context-sensitive limitations for creating cluster with
specified SIZE?

I'm using Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 -
64bi

Thanks a lot.
abracadabuda

hpuxrac

unread,

Nov 26, 2009, 12:48:47 PM11/26/09

to

On Nov 26, 4:32 am, abracadabuda <abracadab...@gmail.com> wrote:

snip

> Hi everyone.
>
> I have a table with transactions of customers. There are about 20
> transactions per month for a customer, I need to store history of 24
> months. There is about 1 500 000 customers. Average length of row is
> 300 bytes.
>
> The only select statements on this table would be count of
> transactions, balance of transactions, etc., etc of one customer.
> There are no updates, inserting and deleting is running in batch. So I
> decided to use hash cluster, with hash on customer_number.
>
> Size of one db block is 8KB. One block therefore cannot store all
> records of one customer, so I tried to set parameter SIZE of the
> cluster to a higher value. I calculated the space for one custmer = 20
> transations * 24 months * 300 bytes = 144 000 bytes, that's 140,625
> KB, so I decided to use the SIZE of 144KB.
>
> I tried to create the cluster with mentioned SIZE (144KB), but I got
> this error message:
> Error at line 1
> ORA-02229: invalid SIZE option value
>
> Action, I googled for this error is:
> Specify an appropriate value.
>
> I tried to decrease the SIZE value, the highest accepted SIZE value
> was 32766 bytes (it didn't even accept 32K)
>
> Only restriction I found in Oracle documentation is, that it has to be
> an integer value. There is no restriction when specifing SIZE

> parameter when creating cluster:http://www.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10759/s...
>
> The only restriction I found when specifing SIZE is a note here:http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/clause...

32k is largest block size that oracle supports ...

abracadabuda

unread,

Nov 27, 2009, 3:43:25 AM11/27/09

to

> 32k is largest block size that oracle supports ...

Thanks, i got it now.
I misreaded this:
If SIZE is larger than the data block size, then the database uses the
operating system block size, reserving at least one data block for
each cluster or hash value.

I haven't mentioned, that the size of my cluster was +/- 2000003 *
8KB. My os_block_size is set to 1K, db_block_size is set to 8K.
Therefore the largest real accepted SIZE was 8KB.

But anyway, can you recommend me any other solution, how to store such
amount of data of one customer close one to the other? Partitioning is
not a good idea...

Thanks a lot.

abracadabuda

Jonathan Lewis

unread,

Nov 28, 2009, 8:44:42 AM11/28/09

to

"abracadabuda" <abraca...@gmail.com> wrote in message
news:550aaa84-fa1c-4610...@e27g2000yqd.googlegroups.com...

Why is partitioning not a good idea ? (Apart from the license fees)
Technically you haven't said anything that makes it unsuitable, and
if you're keeping your data for 24 months it's wise to consider
what you're going to do to get rid of the old data after 25 months.

Having said that, you could consider storing the data as an IOT
(index organised table). Data for one customer for one month
is about 6,000 bytes, which means roughly one block, so for 24
months it would be something like 24 blocks - except you'll have
to add a few because the index will probably run at about 50%
utilisation.

I assume that you will have a primary key of
(customer_id, transaction_date, sequence_number)
where the sequence number exists to allow for two transactions
for a customer on a given date. Create the primary with compression
on the first column.

This is a case where you could argue for a larger block size -
especially if your queries always range over more than one month -
as this would allow a single block I/O to fetch a larger chunk of
history. On the other hand, if most of your queries are just for the
last couple of weeks, you might as well stay with the typical 8KB.

An alternative would be an index cluster - clustered on customer id,
but with an extra index on (customer_id, transaction_date). The
clustering would pull data into a small number of (scattered) blocks,
time-dependency would pack each customer's data in a time-related
fashion, and the second index would allow you to pick data for a small
date range for a single customer without having to visit all the blocks
for that customer.

--
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com