Interesting…
Assuming you’re not using a transactional file system, how do you handle the file system’s lack of guarantees? How come you are certain are you that a kill -9 or unplugging the box will not lead to loosing transactions already “committed”?
Cool - I was under the impression that a flush() not only kills performance, but does not guarantee the contents to be physically on disk, in order to survive pulling a plug… it maybe that I haven’t looked at that in a long time, but this topic is really interesting to me. Do you have some more info? Which Java class are you using to write to disk?
In fact, does anyone know of a safe and transactional log file-based implementation? It has been quite some time since I looked into this…
Can’t say about the future of this – you’d certainly have to compare to a mysql with inmem engine or hashtable etc… but my motto is that evolution requires diversity hence there is value in just having an alternative, otherwise I would turn off the laptop and go biking J
Cheers,
Razie
It doesn't.
And then there's the hardware, which has its own buffers, some of which it may
and some of which it may not advertise to the outside world (i.e., the OS and
thus you), while any subset of the advertised buffers may or may not be
switched off.
I'm also interested in the steps taken to ensure that the data gets physically
on the disk, DESPITE all claims of the OS AND the disk of having done so, I
mean, did it REALLY get to the disk.....
-Martin
> And then there's the hardware, which has its own buffers, some of which
> it may and some of which it may not advertise to the outside world
> (i.e., the OS and thus you), while any subset of the advertised buffers
> may or may not be switched off.
Hardware RAID5 is the poster child for that. They appear to the host
system as a small number of 'virtual' drives even when they have many
tens of physical drives and the controllers usually have on-board cache
and batteries to either hold the contents in RAM or (better) to power
the array for long enough for the data to be flushed to disk if the
power goes off. Then of course there's also the caches on the disks
themselves - the IDE, SATA or SCSI bus transaction may have completed
but the disk itself may still be caching the data, and if the power goes
off at that point your data is gone. Here's a random selection of links
from google explaining the issue in more detail:
http://www.jasonbrome.com/blog/archives/2004/04/03/writecache_enabled.html
http://milek.blogspot.com/2010/12/linux-osync-and-write-barriers.html
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/writebarrieronoff.html
http://linux.die.net/man/2/sync
> I'm also interested in the steps taken to ensure that the data gets
> physically on the disk, DESPITE all claims of the OS AND the disk of
> having done so, I mean, did it REALLY get to the disk.....
It's possible to ensure that e.g. by disabling the caches on disk drives
(plus lots of other necessary steps) but it nearly always has a
significant performance impact unless specific (and usually expensive)
steps are taken to circumvent it.
There's a pretty good description of how one filesystem (ZFS) handles
this at
http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained.
Note in particular the use of Flash memory (SSDs in effect) to store
the filesystem intent logs - Flash is both fast and doesn't need power
to retain is content, so it is ideal for an intent log.
As the old saying goes "Fast, safe or cheap, pick any two". Solving
this sort of issue is why top-end hardware is so expensive, and why
large corporates are prepared to pay for it.
--
Alan Burlison
--
Do you guys figure the OS is messing with me? If you debug it, it seems that it's not messing with me, contents actually appear in file before reader reads...
It writes 1 million 50 char records at 52 thousand per second. Laptop i7+SSD
-----Original Message-----
From: scala...@googlegroups.com [mailto:scala...@googlegroups.com] On Behalf Of Alan Burlison
Sent: October-18-11 6:45 PM
To: Martin S. Weber
Cc: scala...@googlegroups.com
Subject: Re: [scala-user] Is 2K ACID TPS fast for a disk based (scala) database?
So... anyone here that can answer this? How is this done in a reasonably serious DB: postgress, mysql etc? how do they do it so that the rollback logs are fail-safe after commit() returns? Do they really need some deep hooks into the BIOS of the HDD (not likely) or is there some simple dime-sized algorithm?
Interestingly enough - flush() does have an effect, using it after each write() dropping the "performance" significantly, from 57kps to 35kps... so it does appear to be doing something... as to what exactly it does, it probably is a question for the actual semantics of the OS/FS, disk etc... in this particular case, the spectacular drop in performance makes me think it ended up strait on the disk, bypassing at least a few buffers...
On 10/18/11 11:43, William la Forge wrote:
Razvan,
Easy, at least for the small records datastore, which swift builds on. I
have two dedicated areas on disk which are written to alternately for
each transaction. Each area contains both a timestamp and a checksum. On
startup you read both and use the latest valid data.
So you've turned off the disk's hardware cache, too?
-Martin
So your thing is now 70 times slower than when you started asking our opinion… you somehow seem happy though…?
Did we finally find what this forum is really good for?
J
From: scala...@googlegroups.com [mailto:scala...@googlegroups.com] On Behalf Of William la Forge
Sent: October-18-11 11:16 PM
To: ScottC
Cc: scala-user
Postgres and some other databases have a mode where transaction
commits are not synchronous, which increases performance quite a bit
wihout data integrity loss -- but transactions may be lost in the
event of power failure or abrupt process or OS termination.
If you write transactions to a log in an appending fashion, and do not
overwrite data, it is possible to maintain data integrity without
synchronous writes to disk.
If you must guarantee that data is persisted at the end of a
transaction -- that the transaction will not be 'forgotten' -- then
you must do synchronous writes -- even with duplicate hardware. For
example, if you need to commit a financial transaction and confirm it
to a user, there is only one option: highly reliable storage and a
full flush to that storage.
If you can tolerate loss of some of the most recent transactions, you
have much higher performance and do many tricks to reduce the
likelihood and volume of data loss, such as logging to multiple
servers or storage pools.
So - using flush() gives you protection against nullpointers or outofmemory killing your app, which is useful enough, while sync() gives you much better protection against pulling the plug...?
Neither is obviously guaranteed to be razie-proof 100%.
I found this beauty though: http://www.jboss.org/jbosstm/fileio
This was fun! Learned a lot! There are massive differences in performance between FileOutputStream and FileWriter for instance.