> storing a blob per file won't scale very well even if
> setting LOB_FILES_IN_DIRECTORIES to true.
Do you know an algorithm that would scale better?
(I have some ideas how to avoid duplicate files, but that's probably
not an issue here).
> Any idea on the time frame for this?
The current plan is to create a 1.1 branch later this year where such
feature are enabled. Databases will not be fully backward compatible
(upgrade may be required). The plan is to use the same source code as
the 1.0 branch, only with default settings (for example
LOB_FILES_IN_DIRECTORIES enabled by default). I'm not sure when
exactly this will happen (maybe June).
Regards,
Thomas
What you describe is similar to storing the blob in the data file
itself. It should work well for medium sized LOBs (maybe 100 bytes -
64 KB). Instead of adding a new storage mechanism the LOBs could be
stored in a system table (SYSTEM_LOB or so) and then only read when
required.
The advantage of using individual files (when used for larger LOBs)
are: the space is reclaimed when deleting the LOB. It is easier to
concurrently write LOBs if the size is unknown. Incremental backup
(only backup files created/modified since the last backup) is
possible.
> ValueLob.getNewObjectId() to return a sequence of increasing integers
I tried this as well and it looks promising. An optimization is to
cache the directory lists (using a small LRU cache). My prototype is
now faster for then the default setting even with a small number of
LOBs, I will continue to work on this.
Regards,
Thomas
> * Cache Efficiency - If the LOBs are stored inline, they'll take up
> lots of space database page cache. Perhaps this could be solved by
> adding something like the NOCACHE table option that Oracle supports?
NOCACHE is a good idea. I will add it to the roadmap (low priority right now).
> * Database Size - We'll be storing more than 256GB of LOB data.
> Perhaps this could be solved by supporting larger data files or
> multiple data files?
Multiple data files are already supported and used by the
LinearHashIndex; however I will probably remove this index in the next
release as it is very slow and unstable). Each data file has its own
cache and can have logging enabled or not.
> * Data File Performance - In our test case of inserting 100,000 10KB
> blobs into a table 30% of the time was spent in
> org.h2.store.DiskFile.allocate(). I suspect this problem could be
> fixed, but it still leaves all of the above issues.
30% is really bad of course. I would like to fix this problem. Could
you post more information about your test? Maybe I need to re-write
DiskFile.
> so let me know if you
> have any words of advice.
In any case I would store very small blobs in-place, and use
individual files for objects larger than 128 KB or so. So your
solution is for medium sized LOBs. It would be great if DiskFile can
be re-used (or replaced). I think there is a potential problem with
not logging and writing multiple objects to the same file: on power
failure old data could get corrupted. Not so with individual files, or
when using append-only files (for example tar files). But append-only
requires some kind of 'garbage collection'.
Regards,
Thomas
> Other than using the LinearHashIndex (which is going away), I assume
> there's no way to use multiple data files?
What I mean is, you can use multiple data files, and you can have a
look how to do that in LinearHashIndex.
> I've filed a bug with a test case and profile report to track this
> issue: http://code.google.com/p/h2database/issues/detail?id=13
I will have a look, thanks!
> Could you explain the corruption problem in more detail?
Hard drives don't always write in the same order as the write methods
was called (elevator optimization or so). Also, usually there is a
write cache (writes are delayed). File systems have algorithms to
detect that and recover in most situations, but if you write to a file
without logging your changes there might be a problem.
> not doing lob updates in place and syncing the LOB data files on
> commit would avoid many of these issues.
Append only should be quite safe, however update in place without log
file is quite dangerous. Basically you would need to test it. I think
it can be done, but at least checksums should be used (checksums are
used for DiskFile) and recovery algorithms must be implemented.
Regards,
Thomas