Just my 2 cents.
I'm pro for the customisable file extension, but I don't think there
should be allowed custom data before and after the header. I don't think
anything should write to the database file except for H2 itself, even if
just for reliability. If you want to store extra data as configuration
or anything else, then it can be stored in a H2 table.
Also, I was thinking with having data after the file header or before,
if you change the length of data before it will need to shift the entire
database file along and every time h2 changes the length of the database
file it will need to copy the data after the ender. So it would be a
huge performance hit as the database grows in size. It would be much
better off the data being stored in a h2 table and h2 managing it.
Anyway, just my thoughts.
Cheers, Ryan
For backward compatibility I guess the current default suffix ".h2.db"
will stay for a while. The earliest possible change is H2 version 1.4,
and I'm not sure if it makes sense yet. First I want to get rid of all
the other database files (lob files, temp files). The .trace.db file
may get renamed to .log at some point (not sure yet).
There is a feature request for "Database file name suffix: a way to
use no or a different suffix (for example using a slash).". That means
if you use the database URL "jdbc:h2:~/test/" then it would create a
file named "test". This is not implemented yet (patches are welcome),
but what do you think about it?
> -- H2 0.5/B -- and it may have three lines of that header
Yes, the first is the regular header, the second and third may be
encrypted (for encrypted databases).
> -- H2 0.5/B D 1.2.147--
I will consider this for the future. There is already the CREATE_BUILD
setting in the database file. Plus, there is a read-version and
write-version in the header (see PageStore.java class javadoc).
What about h2database.com as the header? Or h2database.org.
The idea is that the header is one page long, and never changes once
the database is created. If it could change then there is a problem
how to change it in a transactional way (with possible power failure
while it's written). The second and third page contain header data
that can change (both pages are supposed to contain the exact same
data, so that this data can be changed in a transactional way, and
recover from a power failure).
Regards,
Thomas
> using a long string like 'h2database.xxx' maybe
> abit too long.
The first page (usually 2 KB) doesn't ever change, and there is a lot
of unused space there.
> why does the second and third line of the '-- H2 0.5/B --' may
> get encrypted for encrypted databases
See the encrypted file source code, SecureFileStore.java
> To bundle clob and blobs into the main database files can make the
> main database file very bulky and slow down H2 significantly.
There are many advantages to use one file, for example it prevents the
'too many open files' problems. Also, there are many problems because
of lost / deleted files. De-duplication is easier. One disadvantage is
that the database file can't shrink quickly after deleting many LOBs.
Regards,
Thomas
That way you have both goodies, only two files and one of them specialized fileStore to contain all lobs columns ( used only if lobs are referenced) , that can facilitate locator's implementation too.
regards,
Dario.
El 17/12/10 09:49, Thotheolh escribi�:
> would a single storage file slow down H2 engine from looking
> for and getting or writing data since
No. The only problem (I know) is what I have already described.
> What think you of using only 2 files, one for all normal columns and otherspecialized for long data types(LOBS).
I thought about this, but I would try to avoid it if possible. What
would be the advantage?
> that can facilitate locator's implementation too
How?
> It did propose to split the huge storage to CLOBS and BLOBS so that there wouldn't be a need to mix binary and character based storage
That's no problem at all.
Regards,
Thomas
El 20/12/10 17:30, Thomas Mueller escribi�:
>> would a single storage file slow down H2 engine from looking
>> for and getting or writing data since
> No. The only problem (I know) is what I have already described.
I doubt that this could be real. In a database with many lobs columns and rows the size of this single file can easily grow to disadvantageous levels.
A single lob field can have the size of several full tablesor even exceeding the size of the rest of the database.
Fragmentation at file system (OS) level will have much more impact on large files, caching (at OS level) will be less effective too, read-ahead capabilities will be less effective too and finally IO load will increaseinevitably.
It is easy to measure the degradation of the performance of a database as the data volume is significantly increased. I mean, if a db without lobs have 1 GB size and with lobs goes over 10 GB, would be very optimistic to think that the overall performance
will not change. Just imagine defragment or compact a file of that size.
In a two files scenario, we would havea main file of 1 GB with almost all data + indexes , and the lobs file of 9 GB with lobs only. ( Not so bad as a file per lob and not so big as all in one file).
>> What think you of using only 2 files, one for all normal columns and other specialized for long data types(LOBS).
> I thought about this, but I would try to avoid it if possible. What
> would be the advantage?
I can think in all stated above and more:
1) Main file will concentrate almost all indexes and data (except lobs) and references to lobs files as column values for lobs columns in the main file.
2) Lobs file can have a different fileStore (much more simple and specific) organized in variable length extents or pages to take advantage of sequential nature of it's contents.
Such a fileStore only need an avail-list and one index of pointers or references to be used as column value in the main file ;
like old xBase .DBT files that use a simple and very effective format or .tar files format that was designed for sequential access devices (or streaming in Java parlance).
So a locator can be implemented easily (at file level) as the Lob Reference pointer + locator offset.
For extents contents compacting (if needed) can be used a stream oriented method like deflate or gzip without harm streaming .
Each extents can have a header with a tag-marker, length , checksum, etc. ; to make broken file recovery easier.
>> that can facilitate locator's implementation too
> How?
Is explained above, but again.
If lob's fileStore is organized as a sequence of variable length extents with and index of pointers and available (or deleted) extents ;
a locator can be implemented easily (at file level) as the Lob Reference pointer + the locator offset.
Streaming access to lob's contents will be simplified and benefited too.
regards,
Dario.
> Fragmentation at file system (OS) level will have much more impact on large files, caching (at OS level) will be less effective too,
read-ahead capabilities will be less effective too and finally IO load
will increaseinevitably.
Could you please provide links to back this up? Or provide a test case
that shows multiple small files are significantly faster than one
large file (given the same file operations)?
> It is easy to measure the degradation of the performance of a database as the data volume is significantly increased.
Do you think the database will be faster if you split it into multiple
files? I don't think so. But if you want, H2 supports the "split file
system". You can easily find out. This also has the advantage that all
files are about the same size (less files).
Regards,
Thomas
El 26/12/10 06:14, Thomas Mueller escribi�:
>> Fragmentation at file system (OS) level will have much more impact on large files, caching (at OS level) will be less effective too,
> read-ahead capabilities will be less effective too and finally IO load
> will increaseinevitably.
>
> Could you please provide links to back this up? Or provide a test case
> that shows multiple small files are significantly faster than one
> large file (given the same file operations)?
There are a lot of information related to File Systems efforts to mitigate the impact of fragmentation and smart caching strategies. All of this is strongly OS and file system dependent, but there are many factors in common.
This document has some interesting metrics: http://www.linuxsymposium.org/2006/filesys_frag_slides.pdf
In regard to H2 FileStore usage and fragmentation , "File Scattering" can be the most interesting type of fragmentation.
For generals about this subject, start with: http://en.wikipedia.org/wiki/File_system_fragmentation and http://en.wikipedia.org/wiki/File_sequence ; this pages has many references to technical documents and papers.
In regard to caching and read-ahead (or pre-fetch) note that this happen at hardware, file system and OS level. Read-ahead (at hardware level) is a way to reduce IO operations mainly on sequential access patterns.
Lobs of big size are ideal subjects for streaming (sequential access IO pattern) in contrast to indexed table rows that produce mainly random access IO patterns.
>> It is easy to measure the degradation of the performance of a database as the data volume is significantly increased.
> Do you think the database will be faster if you split it into multiple
> files? I don't think so.
Others DBMS use many directories with many files as storage, but this isn't the point.
If you say: do the same thing with many files , probably will be worst.
I'm talking about separate storage in only 2 files with different FileStore implementations. One for general database metadata, table rows and indexes (good for random access IO pattern) as small as possible.
And another file containing only LOBS objects, organized to facilitate sequential access IO patterns (over each internal Lob object) and to reduce the size of the other file.
Motivation: In many applications we see that for any table with LOBS columns only 1 of 4 querys (or less) retrieve lobs columns. I know this can't be generalized but don't seem unreasonable to think that big LOBS be accessed less frequently than the rest
of commons data type values in the same table.
Even more, is a common database design practice to use lob specific tables like ( id, lob_value ) to put all lob values in one table and master tables with FK references only.
About performance and IO load, I don't have a well done benchmark, but I have an application in production on to sites - similar conditions except their database size. We see up to 20% of difference in queries performance.
I will try to use "sar", "iostat", etc. ; to analyze if this is because IO wait time or cache hit rate change, but can be tricky to isolate H2 load from the rest of IO load.
> But if you want, H2 supports the "split file
> system". You can easily find out. This also has the advantage that all
> files are about the same size (less files)
Remember that we are discussing about if a good idea or not, store big lobs values with the rest of database objects in only one file when LobsInDatabase.
Anyway "split file system" can be useful to do some performance comparison between one only large file or many fixed size files.
regards,
Dario