Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

LOB segment space usage

138 views
Skip to first unread message

andreik

unread,
Aug 17, 2007, 8:05:38 AM8/17/07
to
Hello,

can anyway here explain me, what is oracle storing inside the LOB
segments besides the data which I am telling it to store?

here's a test.
10.2.0.3 EE Solaris 5.10 sparc 64bit (have tried the same on 32bit
Linux, result was the same)

1. create a tablespace with 2K blocksize called "smallblock"
2. create a table with a nclob type field and use smallblock
tablespace to store the LOB segment:

create table t1 (i int, b nclob)
lob (b) store as t1_lob (
chunk 2K
disable storage in row
tablespace smallblock
pctversion 0);

note the "disable storage in row" this makes the LOB to be out-of-line
and pctversion 0 is supposed to mean, than I don't want to waste any
space on concurrency.

3. use the Tom Kyte's show_space() procedure to get the initial
picture:

SCOTT@andrkydb> exec show_space ('T1_LOB','SCOTT','LOB');
Unformatted Blocks ..................... 0
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 0
Total Blocks............................ 32
Total Bytes............................. 65,536
Total MBytes............................ 0
Unused Blocks........................... 28
Unused Bytes............................ 57,344
Last Used Ext FileId.................... 4
Last Used Ext BlockId................... 33
Last Used Block......................... 4

The parameter which I'm looking at is "Full Blocks". It is zero at the
moment, which is very correct.

4. insert a row, passing 2048 bytes of data into nclob field and take
a look at what happens:

SCOTT@andrkydb> insert into t1 values (0, rpad('*',2048,'*'));

1 row created.

SCOTT@andrkydb> commit;

Commit complete.

SCOTT@andrkydb> exec show_space ('T1_LOB','SCOTT','LOB');
Unformatted Blocks ..................... 25
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 3
Total Blocks............................ 32
Total Bytes............................. 65,536
Total MBytes............................ 0
Unused Blocks........................... 0
Unused Bytes............................ 0
Last Used Ext FileId.................... 4
Last Used Ext BlockId................... 33
Last Used Block......................... 32

So why did it need to use 3 blocks (that's 6144 bytes) instead of just
1 to store the poor 2048 bytes?

SCOTT@andrkydb> select (dbms_lob.getlength(b)) from t1;

(DBMS_LOB.GETLENGTH(B))
-----------------------
2048

What causes such an overhead?

In experimental way I have discovered that only ~990 bytes of data
fits into a single block (if nclob is used):

SCOTT@andrkydb> insert into t1 values (0, rpad('*',990,'*'));

1 row created.

SCOTT@andrkydb> exec show_space ('T1_LOB','SCOTT','LOB');
Unformatted Blocks ..................... 24
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 4
Total Blocks............................ 32
Total Bytes............................. 65,536
Total MBytes............................ 0
Unused Blocks........................... 0
Unused Bytes............................ 0
Last Used Ext FileId.................... 4
Last Used Ext BlockId................... 33
Last Used Block......................... 32

This kind of explains why do I have a LOB segment of size ~5GB in a
prod database, which stores only 2,2GB of useful data.

In case of CLOB, overhead is slightly smaller. The whole 1800 bytes
fit into a 2048 bytes block. Which is still kind of strange and what
is worse, it is nowhere mentioned in the docs.

I have also tried to load a bigger volumes of data. For example I have
loaded 30K into a LOB with 8K chunks and got 8 blocks filled, which
makes 8 x 8192 = 65536 bytes. Which is exactly 2 times bigger number
than it should have been.

I have searched metalink and docs and forums and have found nothing.
Here -> http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14249/adlob_tables.htm#i1006363
it says absolutely nothing about such an overhead. On the contrary, is
it said there that:
[Varying-width character data in CLOB and NCLOB datatypes is stored in
an internal format that is compatible with UCS2 Unicode character set
format. This ensures that there is no storage loss of character data
in a varying-width format]

note the "no storage loss" part ;)

How can I be estimating my data structures in this case? Such a
behavior makes this task very difficult if not impossible. Or do I
need to simply always multiply by 2? (if so then I want to see it in
the docs)
Why would I use oracle to store my large objects, if I know I'm going
to waste about 2 or 3 times more disk space than storing them simply
inside a file system...?

Thanks for your time!

-- Andrei

sybrandb

unread,
Aug 17, 2007, 8:19:57 AM8/17/07
to
> Here ->http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14249/adlob_...

> it says absolutely nothing about such an overhead. On the contrary, is
> it said there that:
> [Varying-width character data in CLOB and NCLOB datatypes is stored in
> an internal format that is compatible with UCS2 Unicode character set
> format. This ensures that there is no storage loss of character data
> in a varying-width format]
>
> note the "no storage loss" part ;)
>
> How can I be estimating my data structures in this case? Such a
> behavior makes this task very difficult if not impossible. Or do I
> need to simply always multiply by 2? (if so then I want to see it in
> the docs)
> Why would I use oracle to store my large objects, if I know I'm going
> to waste about 2 or 3 times more disk space than storing them simply
> inside a file system...?
>
> Thanks for your time!
>
> -- Andrei

I think you would need to dump data blocks to a trace file to look at
the actual gory details.
The dump will be in a symbolic format (alter database dump ...)

You would store large objects as BLOBs when you want to edit them
through your application.
If you don't need to edit them, or shouldn't edit them, because they
can't be changed, a BFILE would suffice.

--
Sybrand Bakker
Senior Oracle DBA

Maxim Demenko

unread,
Aug 17, 2007, 10:29:50 AM8/17/07
to andreik
andreik schrieb:

> So why did it need to use 3 blocks (that's 6144 bytes) instead of just
> 1 to store the poor 2048 bytes?

As yourself mentioned, the data will be stored in the fixed width
UCS2(like) character set, that means, you store not 2048 bytes, but 2048
characters, each of them requires 2 bytes of storage.
You can probably see it if you follow the advice from Sybrand and dump
your blocks, or may be you would like to read a nice white paper from
Tanel Pőder
http://integrid.info/Poder_LOB_Internals.pdf
about LOB internals.

Best regards

Maxim

andreik

unread,
Aug 17, 2007, 11:26:46 AM8/17/07
to
On Aug 17, 5:29 pm, Maxim Demenko <mdeme...@gmail.com> wrote:
> andreik schrieb:
>
> > So why did it need to use 3 blocks (that's 6144 bytes) instead of just
> > 1 to store the poor 2048 bytes?
>
> As yourself mentioned, the data will be stored in the fixed width
> UCS2(like) character set, that means, you store not 2048 bytes, but 2048
> characters, each of them requires 2 bytes of storage.
> You can probably see it if you follow the advice from Sybrand and dump
> your blocks, or may be you would like to read a nice white paper from
> Tanel Põderhttp://integrid.info/Poder_LOB_Internals.pdf

> about LOB internals.
>
> Best regards
>
> Maxim

this is just not right. I have UTF8 as national character set and I
store ascii characters, so why does it have to use two bytes for that.
But ok, consider another test. This time I used CLOB as the LOB type.

SCOTT@andrkydb> truncate table t1;

Table truncated.

SCOTT@andrkydb> insert into t1 values (1, rpad('*',2000,'*'));

1 row created.

SCOTT@andrkydb> exec show_space('T1_LOB','SCOTT','LOB');
Unformatted Blocks ..................... 26


FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0

Full Blocks ..................... 2


Total Blocks............................ 32
Total Bytes............................. 65,536
Total MBytes............................ 0
Unused Blocks........................... 0
Unused Bytes............................ 0
Last Used Ext FileId.................... 4
Last Used Ext BlockId................... 33
Last Used Block......................... 32

It has used 2 blocks to store 2000 characters.

Now when I store 1800, then it needs only 1 block:

SCOTT@andrkydb> truncate table t1;

Table truncated.

SCOTT@andrkydb> insert into t1 values (1, rpad('*',1800,'*'));

1 row created.

SCOTT@andrkydb> exec show_space('T1_LOB','SCOTT','LOB');
Unformatted Blocks ..................... 27


FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0

Full Blocks ..................... 1


Total Blocks............................ 32
Total Bytes............................. 65,536
Total MBytes............................ 0
Unused Blocks........................... 0
Unused Bytes............................ 0
Last Used Ext FileId.................... 4
Last Used Ext BlockId................... 33
Last Used Block......................... 32

This way the 2-byte unicode explanation stops to make sence. Since
2048/2 != 1800
In other words it would not be possible to store 1800 bytes in a
single block if 2 bytes were used for each character.

Regarding the block dump, what am I supposed to do will all those hex
values from the dump file? I am not ready to get THAT deep ;)


Maxim Demenko

unread,
Aug 17, 2007, 5:11:58 PM8/17/07
to andreik
andreik schrieb:

> On Aug 17, 5:29 pm, Maxim Demenko <mdeme...@gmail.com> wrote:
>> andreik schrieb:


I can hardly explain your results, because my tests can't confirm them
SQL> SET SERVEROUTPUT ON
SQL> SELECT * FROM V$VERSION
2 /

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Prod
PL/SQL Release 10.2.0.3.0 - Production
CORE 10.2.0.3.0 Production
TNS for Linux: Version 10.2.0.3.0 - Production
NLSRTL Version 10.2.0.3.0 - Production

SQL> SELECT VALUE
2 FROM NLS_DATABASE_PARAMETERS
3 WHERE PARAMETER = 'NLS_CHARACTERSET'
4 /

VALUE
-----------------------------------------------------------------------------------------------------------------------------
AL32UTF8

SQL> CREATE TABLESPACE smallblock DATAFILE
'/opt/oracle/oradata/ORA102/smallblock01.dbf' SIZE 10M
2 EXTENT MANAGEMENT LOCAL AUTOALLOCATE SEGMENT SPACE MANAGEMENT AUTO
BLOCKSIZE 2K
3 /

Tablespace created.

SQL> create table t1 (i int, b clob)
2 lob (b) store as t1_lob (
3 chunk 2K
4 disable storage in row
5 tablespace data2k
6 pctversion 0)
7 /

Table created.

SQL> insert into t1 values (1, rpad('*',994,'*'))
2 /

1 row created.

SQL> exec show_space('T1_LOB',user,'LOB')


Unformatted Blocks ..................... 27
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 1
Total Blocks............................ 32
Total Bytes............................. 65,536
Total MBytes............................ 0
Unused Blocks........................... 0
Unused Bytes............................ 0

Last Used Ext FileId.................... 6
Last Used Ext BlockId................... 97
Last Used Block......................... 32

PL/SQL procedure successfully completed.

SQL> truncate table T1
2 /

Table truncated.

SQL> insert into t1 values (1, rpad('*',995,'*'))
2 /

1 row created.

SQL> exec show_space('T1_LOB',user,'LOB')


Unformatted Blocks ..................... 26
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 2
Total Blocks............................ 32
Total Bytes............................. 65,536
Total MBytes............................ 0
Unused Blocks........................... 0
Unused Bytes............................ 0

Last Used Ext FileId.................... 6
Last Used Ext BlockId................... 97
Last Used Block......................... 32

PL/SQL procedure successfully completed.

As everybody can see - 994 '*' characters fit into one block, 995 no
more. And here is what i see in the blockdump - as you may guess, the
'*' is represented as 0x2A:

...snipped
-----------------------------------
Auxillary Map
--------------------------------------------------------
Extent 0 : L1 dba: 0x01800021 Data dba: 0x01800025
--------------------------------------------------------

Second Level Bitmap block DBAs
--------------------------------------------------------
DBA 1: 0x01800023

buffer tsn: 6 rdba: 0x01800025 (6/37)
scn: 0x0000.001e6e00 seq: 0x02 flg: 0x04 tail: 0x6e002802
frmt: 0x02 chkval: 0xb3cb type: 0x28=PAGETABLE MANAGED LOB BLOCK
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x0CD97600 to 0x0CD97E00
CD97600 00006228 01800025 001E6E00 04020000 [(b..%....n......]
CD97610 0000B3CB 0000D388 01000000 01000000 [................]
CD97620 00002E71 00000000 00000000 00000000 [q...............]
CD97630 01800021 00000000 2A002A00 2A002A00 [!........*.*.*.*]
CD97640 2A002A00 2A002A00 2A002A00 2A002A00 [.*.*.*.*.*.*.*.*]
Repeat 122 times
CD97DF0 2A002A00 2A002A00 2A002A00 6E002802 [.*.*.*.*.*.*.(.n]
Long field block dump:
Object Id 54152
LobId: 00010001712E PageNo 0
Version: 0x0000.00000000 pdba: 25165857
00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a
00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a
00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a
00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a
00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a
00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a
00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a
00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a
00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a
00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a
-----------------------------------
...snipped


So, from 4 sides - block dump, dbms_space package,Tanel Pőder Whitepaper
and last but not least - Oracle Documentation i get consistent results
saying , that characters are stored in lob by 2 bytes per character -
and from your side the opposite.
Now tell me please - what should i rely on?

Best regards

Maxim

sybr...@hccnet.nl

unread,
Aug 18, 2007, 2:02:10 AM8/18/07
to
On Fri, 17 Aug 2007 15:26:46 -0000, andreik <spamme....@gmail.com>
wrote:

>Regarding the block dump, what am I supposed to do will all those hex
>values from the dump file? I am not ready to get THAT deep ;)
>

I didn't try it myself, but I was hoping the record overhead would
(outside the actual data) would clearly show.

Jonathan Lewis

unread,
Aug 18, 2007, 7:48:27 AM8/18/07
to

Pulling together various comments from
your posts, here:

A 2KB block allows 1,988 bytes of space for LOB
storage - the remaining 60 bytes is the standard block
overhead plus a few bytes of data specific to the LOB,
such as the object number, LOB ID, and SCN. So if
your character set is single byte then you can get 1,988
characters per block.

As the note you've quoted says, if you use a multibyte character
set as your database character (or for the National Character
Set if you use NCLOBs) the data is stored as a fixed two-byte
width - so you can only store 994 characters per block.

You don't have to worry too much about subtle details with
block dumps to get some useful information: For example:

create a block using rpad('x',2000) then dump the LOB blocks.
If the clob has been stored as a single byte character set you will
see lots of
78 78 78 78 78 78
in the block.

If it's been stored as the fixed two-byte set you will see
00 78 00 78 00 78 00 78 00 78

The note you've quoted about:


"This ensures that there is no storage loss of character data
in a varying-width format"

is not trying to say you don't lose space in the data segment,
it's trying to say that you don't lose detail in the character data.
For example, you don't end up with code that a character with
an acute accent (say) and find that it comes back without it's
accent.

Your comment about multiplying by two is appropriate - for
multibyte character sets - it's the price you have to pay if you
need the character precision. Don't forget you also pay in
whole CHUNKs - a single character LOB stored out of line
takes up a whole CHUNK, so choose your chunk-size with
extreme care.

There's an little note on my blog about reviewing LOB storage,
but I don't think it will tell you anything you don't know already:

http://jonathanlewis.wordpress.com/2007/05/11/lob-sizing/


--
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com

Author: Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html


"andreik" <spamme....@gmail.com> wrote in message
news:1187352338.2...@22g2000hsm.googlegroups.com...


> Hello,
>
> can anyway here explain me, what is oracle storing inside the LOB
> segments besides the data which I am telling it to store?
>
>

andreik

unread,
Aug 18, 2007, 9:06:35 AM8/18/07
to
On Aug 18, 2:48 pm, "Jonathan Lewis" <jonat...@jlcomp.demon.co.uk>
wrote:


Thank you for your clear explanation. I have been thinking it all over
again and I see where I was mislead now.
I didn't know about the 60K block overhead. So now I know it. And now
I get a full picture of what's going on inside lobs.

I have learned a lot new during this "investigation", so I'm quite
happy about it now :) Thanks.

-- Andrei

andreik

unread,
Aug 18, 2007, 9:12:36 AM8/18/07
to
On Aug 18, 12:11 am, Maxim Demenko <mdeme...@gmail.com> wrote:
> andreik schrieb:
>
> > On Aug 17, 5:29 pm, Maxim Demenko <mdeme...@gmail.com> wrote:
> >> andreik schrieb:
>
> I can hardly explain your results, because my tests can't confirm them
> SQL> SET SERVEROUTPUT ON
> SQL> SELECT * FROM V$VERSION
> 2 /
>
> BANNER
> ----------------------------------------------------------------
> Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Prod
> PL/SQL Release 10.2.0.3.0 - Production
> CORE 10.2.0.3.0 Production
> TNS for Linux: Version 10.2.0.3.0 - Production
> NLSRTL Version 10.2.0.3.0 - Production
>
> SQL> SELECT VALUE
> 2 FROM NLS_DATABASE_PARAMETERS
> 3 WHERE PARAMETER = 'NLS_CHARACTERSET'
> 4 /
>
> VALUE
> -----------------------------------------------------------------------------------------------------------------------------
> AL32UTF8

You have UTF and your database charset that's why your LOBs have 2
bytes per character even in CLOBS. And I have a simple single-char
character set in my database so my clobs were not supposed to be 2
byted.

Anyway, the 60K overhead per block was the answer to my question.

But thanks for your help! Tanel's document is a real good one. Helped
me a lot.

-- Andrei

0 new messages