XADisk proof of correctness

92 views
Skip to first unread message

MarR

unread,
Sep 9, 2013, 5:24:11 PM9/9/13
to xad...@googlegroups.com
Hi, I am really appreciating all the work around XADisk.

I was searching for a "proof" of its correctness.
Commons-Transactions project developers state that it is impossibile to reach atomicity with ordinary file systems (see Status: http://commons.apache.org/proper/commons-transaction/).
Could you please help new adopters to detect any
pitfalls that might arise in case of application crash?
Thanks!

Nitin Verma

unread,
Sep 11, 2013, 2:16:32 PM9/11/13
to xad...@googlegroups.com, io.mario....@gmail.com
Hi Mario,

Thanks for the question.

XADisk can guarantee correct recoveries from the application (i.e. jvm) crashes. And in cases (in general, these should not arise; except when, for example, a file is open by some other application) it fails to perform certain io operations (in normal routine, or during crash recovery), it informs the application client, and disables itself until a user intervention would resolve the problem.

Certain tests have been run to test xadisk doing crash recoveries by "exit"ing the jvm at various points during the transaction (using com.sun.jdi.VirtualMachine.exit method) using another debugger jvm (the test is in src\org\xadisk\tests\correctness\TestCoreXAFileSystem.java; let me know if I can explain in detail about the test).

Regarding the Commons-Transactions, I did some searches for the technical details around why it was abandoned but could not find anything, except these links mentioning it at high-level...

http://mail-archives.apache.org/mod_mbox/commons-user/201003.mbox/%3C9da4f4521003050201l6f...@mail.gmail.com%3E

http://mail-archives.apache.org/mod_mbox/commons-dev/201003.mbox/%3C9da4f4521003280241r38...@mail.gmail.com%3E

An area where there are chances of incorrect results is power failure leading to durability problems. Disk-drives have cache and even with the operating-system issuing command to the disk-drive on writing the data on the disk, the disk-drive may delay the same. But such a problem would arise in databases too, with the same given disk-drive. Some sources:

http://lists.apple.com/archives/Darwin-dev/2005/Feb/msg00072.html

http://www.sqlite.org/speed.html ("SQLite calls fsync() after each synchronous transaction to make sure that all data is safely on the disk surface before continuing").

There are more details around this fsync part (which Java does during FileChannel.force calls on unix systems).

This is for "durability". XADisk uses in-memory locking for all file/directory operation to ensure a certain level of "isolation" among transactions.

It uses redo logs (and some undo logs) to make sure it is able to complete an incompletely committed/rolled-back transaction even after crash - that is, a transaction's operations complete in "atomicity". As I mentioned above, for rare failures in doing so, the application client is reported.

This description is not so detailed/technical to stand as a correctness proof. Please let me know if you want to discuss in detail on specific areas.

I will soon add a detailed note on the correctness topic on the xadisk website.

Thanks & Regards,
Nitin
Reply all
Reply to author
Forward
0 new messages