Talking about UoWFile robustness

6 views
Skip to first unread message

Paul Merlin

unread,
Oct 2, 2014, 5:41:31 AM10/2/14
to qi4j...@googlegroups.com
Niclas Hedhman a écrit :
Ok, but what happens if the JVM is crashed during the 'move/delete' stage. My guess is that some files may contain changes, and others not. As the alternative is to copy the entire parent directory, modify and then rename the directory, plus the issue with concurrent access in this situation being a lot harder... Just asking... ;-)
It is indeed not bullet proof.

You can get an overview of the workflow here:
https://github.com/Qi4j/qi4j-sdk/blob/develop/libraries/uowfile/src/main/java/org/qi4j/library/uowfile/internal/UoWFileFactory.java#L135
   
Here is the UoWFile::apply() method:
https://github.com/Qi4j/qi4j-sdk/blob/develop/libraries/uowfile/src/main/java/org/qi4j/library/uowfile/internal/UoWFile.java#L73

And the UoWFile::rollback() method:
https://github.com/Qi4j/qi4j-sdk/blob/develop/libraries/uowfile/src/main/java/org/qi4j/library/uowfile/internal/UoWFile.java#L96


So, there are places where a JVM crash could do harm.

Between UoWFile lines 84 to 91: current UoW would not be applied but we could get inconsistent files, ie. the original file renamed as ".backup" and the working file not moved. No dataloss but human intervention needed to put things back in their place.

If for some reason, the UoW is rollbacked, then a crash between UoWFile lines 98 to 106 could lead to inconsistent files again. Same thing, no dataloss but human intervention needed.


Writing this I tried to limit work done when applying the UoW, that is, only move/delete operations. If all files are on the same filesystem, this should be pretty atomic and narrow down the window of "oh damn, I'm in trouble".


As you suggested, there may be some room for improvements.
 
Hope this answered some of your wonderings :)

Cheers

/Paul


On Wed, Oct 1, 2014 at 2:58 PM, Paul Merlin <pa...@nosphere.org> wrote:
Niclas Hedhman a écrit :
> You might also be interested in UoWFile library, which allows file
> system changes to be bound by UnitOfWork as well (although I think it
> is not crash recoverable yet. Paul?).
It should be pretty solid, even in case of JVM crash, as during a UoW
the original file remains untouched (copy/modify/move).

As stated in the UoWFile library documentation, note that it has a
performance impact relative to the files size as it duplicates the file
to keep a backup for eventual rollback. The API provides a way to get
non-managed handles on attached files to keep your read-only operations
fast.

/Paul


Niclas Hedhman

unread,
Oct 2, 2014, 8:50:09 PM10/2/14
to qi4j...@googlegroups.com

Yes, sure.

Things that I can think of, and not sure if it really is the best way to address the outstanding issues;

IF a crash is happening during the apply/rollback, things are a bit undeterministic, especially since it is hard to know in which order the OS is persisting the changes to durable media. It isn't required to do so in the same order it is requested, for performance reasons.

   1. At entrance of UowFileFactory.beforeCompletion, I suggest (possibly configurable) that first is a "work log" created, which describes what is to be done. This "work log" is written to disk, and a hard file sync is waited for, to provide the highest level of durability (there are still cases when it doesn't happen though)

   2. On creation of UoWFileFactory, look for these "work log" files and do the "apply" and "rollback" accordingly. In fact, it could be the same routine as being used in the 'normal' case, to ensure "exercising" this codebase.

   3. Currently, IIUIC, the 'delete file' operation will not be 'visible' in such a restart, and a marker for this is also needed for the "work log", I think.


I know these are relatively small matters, but in case one wants to make it better than currently.


Cheers
Niclas

--
You received this message because you are subscribed to the Google Groups "qi4j-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qi4j-dev+u...@googlegroups.com.
To post to this group, send email to qi4j...@googlegroups.com.
Visit this group at http://groups.google.com/group/qi4j-dev.
For more options, visit https://groups.google.com/d/optout.



--
Niclas Hedhman, Software Developer
河南南路555弄15号1901室。
http://www.qi4j.org - New Energy for Java

I live here; http://tinyurl.com/3xugrbk
I work here; http://tinyurl.com/6a2pl4j
I relax here; http://tinyurl.com/2cgsug

Paul Merlin

unread,
Oct 4, 2014, 5:55:14 AM10/4/14
to Niclas Hedhman, qi4j...@googlegroups.com
Niclas Hedhman a écrit :
> Yes, sure.
>
> Things that I can think of, and not sure if it really is the best way
> to address the outstanding issues;
>
> IF a crash is happening during the apply/rollback, things are a bit
> undeterministic, especially since it is hard to know in which order
> the OS is persisting the changes to durable media. It isn't required
> to do so in the same order it is requested, for performance reasons.
>
> 1. At entrance of UowFileFactory.beforeCompletion, I suggest
> (possibly configurable) that first is a "work log" created, which
> describes what is to be done. This "work log" is written to disk, and
> a hard file sync is waited for, to provide the highest level of
> durability (there are still cases when it doesn't happen though)
>
> 2. On creation of UoWFileFactory, look for these "work log" files
> and do the "apply" and "rollback" accordingly. In fact, it could be
> the same routine as being used in the 'normal' case, to ensure
> "exercising" this codebase.
>
> 3. Currently, IIUIC, the 'delete file' operation will not be
> 'visible' in such a restart, and a marker for this is also needed for
> the "work log", I think.
>
>
> I know these are relatively small matters, but in case one wants to
> make it better than currently.
Looks like a good lead. Would you mind creating an issue for this so we
keep track of it?

Reply all
Reply to author
Forward
0 new messages