Perfomance of CLOB and VARCHAR

1,739 views
Skip to first unread message

Kai Schlamp

unread,
Jul 8, 2009, 8:02:32 PM7/8/09
to H2 Database
Yep, just another CLOB and VARCHAR posting ;-)
In the documentation it says that values bigger than 200 chars should
be stored in a CLOB.
I wonder what exactly the diffence is. Lets assume I store a string
with the length of 100000 chars in a CLOB column and also in a VARCHAR
column.
What advantage does CLOB give me?
A smaller size?
A better fetching and storing performance?
And now let's also assume I can't use
preparedStatement.setCharacterStream() for some reason. Would I still
benefit from CLOB for that large string?

Regards,
Kai

bob mcgee

unread,
Jul 9, 2009, 2:43:08 AM7/9/09
to H2 Database
From a combination of experience and poking into the source, the short
answers are:

> In the documentation it says that values bigger than 200 chars should
> be stored in a CLOB.
To be more specific: under 100 chars should always be CHAR/VARCHAR
(there are overheads for LOBs).
Over 1K chars: should almost always be a compressed LOB (either CLOB
or CLOB converted to UTF8 BLOB & optionally compressed).

> I wonder what exactly the diffence is. Lets assume I store a string
> with the length of 100000 chars in a CLOB column and also in a VARCHAR
> column.
I have done this. It works, although you can run into memory problems
in weird places. Generally a bad idea unless you force it to store
less rows in RAM or give H2 a HUGE heap size (over 1 GB).

> What advantage does CLOB give me?
Streaming -- prevents OutOfMemoryError from holding lots of very large
objects in memory (sometimes... I have had problems here)
Compression -- trade CPU for much smaller storage size (for very
compressible data, ie XML, this can sometimes improve performance)
Storage separate from main DB -- much faster with large files, and
would slow the whole DB down a lot otherwise.
Reduces main DB file size -- very important for filesystems limiting
you to <4 GB files (FAT32, etc).
Stores larger objects -- CHAR/VARCHAR is capped at 2 billion
characters (2 GB) due to using Java String type. CLOB can be 256 GB
or more.

> A smaller size?
Yes. MUCH smaller, if you use deflate compression. For a set of
60,000 webpages, deflate compression reduced my storage size from 2600
MB to 627 MB, in DB.
You can enable this with "SET COMPRESS_LOB DEFLATE", or use a BLOB to
store char data by using COMPRESS(STRINGTOUTF8(string_data),'DEFLATE')
(and then EXPAND(UTF8TOSTRING(binary_stored_data) to retrieve it). I
used the latter option because I think it might compress slightly
better, but I can't prove that.

> A better fetching and storing performance?
For big stuff, CLOBs are faster, for small stuff not so much. For
text over 1 MB, storing them in external LOB files makes a huge
performance difference. H2's LOB storage architecture is by-and-large
very smart. Exact details depends on compression, max_inplace_lob
setting, and filesystem. There is a Microsoft Research whitepaper on
the subject: http://research.microsoft.com/apps/pubs/default.aspx?id=64525
Additional details further below below.

> And now let's also assume I can't use
> preparedStatement.setCharacterStream() for some reason. Would I still
> benefit from CLOB for that large string?
Yes! Compression and external file storage can reduce DB size
tremendously and prevent problems with memory and performance of other
tables. See how long it takes to delete 1000 random varchar objects
of 100K apiece vs. 1000 same-size CLOBs (default settings).


** ADDITIONAL DETAILS (jargon): ***
For BLOBs not stored in-place separate files, there is a significant
overhead to opening/closing files for I/O, and the default
VALUE_INPLACE_LOB is a bit conservative. The overhead on file open/
close depends on your filesystem. NTFS in particular penalizes Java
file operations (anybody surprised here?) so the VALUE_INPLACE_LOB
should be increased, probably to at least 8192 kB. CLOBs also carry
some overhead versus VARCHAR: the DBMS tries to use streams rather
than storing LOBs in RAM, and has to store whether the LOB is
compressed and if it is in-place or separate file (and appropriate
info for sep. file stores). This tends to slow things down
*slightly.* When you are always or often forced to read the whole
string (as with a LIKE '%foo%' clause) VARCHAR will probably be a
little bit faster.

VARCHAR is also optimized to try and only store one copy of each
unique strings in RAM (see org. -- for short strings with many dupes,
this can save a lot of memory. For long strings it slows stuff when
checking for dupes. As far as I can tell (the code being complex)
CLOB does not do this. See: org.h2.value.ValueString.get(String s)
versus org.h2.value.ValueLob in the source.

The Microsoft whitepaper above isn't wholly aplicable, because MS SQL
Server doesn't behave the same as H2, but the gist is right. DB data
in MS SQL is more rigidly locked into DB pages, and doesn't
automatically migrate to the external LOBs like H2 does. This may
change with the new page store scheme in future H2 versions, of
course. In general, storing large objects in-line with the main table
data will cause fragmentation of the pagestore as the objects are
added, removed, and modified. This will make the DB grow less
efficient over time, just like fragmented filesystem. Ergo why H2
does not store large LOBs in-place.

If you're storing anything over 1 KB as CLOB, and you don't also have
incompressible BLOBs that would be adversely impacted by compression
settings, you should use compression. Text compresses really well,
and HTML and XML compress extremely well. In some cases with slow
disks and fast processors, compression will actually boost peformance
because you don't need to read as much from disk. XML should always
be compressed, when you can't avoid it altogether, because it is the
most bloated format known to man.

Apologies for any typos (it's early morning here), and I hope this is
helpful.

Regards,
Bob McGee

On Jul 8, 8:02 pm, Kai Schlamp <schl...@gmx.de> wrote:
> Yep, just another CLOB and VARCHAR posting ;-)



>
> Regards,
> Kai

turkan

unread,
Jul 9, 2009, 8:29:29 AM7/9/09
to H2 Database
Bob, thank you for your very detailed explanations. That really helped
a lot and should be made available in every database FAQ.

Thanks again,
Kai

Thomas Mueller

unread,
Jul 12, 2009, 4:06:58 PM7/12/09
to h2-da...@googlegroups.com
Hi,

Bob's explanation is very good, and I will try to add some of it to
the documentation at
http://www.h2database.com/html/advanced.html#large_objects and
http://www.h2database.com/html/datatypes.html#clobtype

Regards,
Thomas
Reply all
Reply to author
Forward
0 new messages