[Haskell-cafe] STM Finalizers

52 views
Skip to first unread message

Michael Schröder

unread,
Jul 30, 2015, 7:11:23 AM7/30/15
to haskel...@haskell.org
Hi everyone,

I want to finally announce some work I did as part of my master's thesis [1]. It's an extension to GHC's STM implementation. The main idea is to introduce a new primitive function

    atomicallyWithIO :: STM a -> (a -> IO b) -> IO b

Like the existing atomically operation, atomicallyWithIO performs an STM computation. Additionally, it takes a finalizer, which is an arbitrary I/O action that can depend on the result of the STM computation, and combines it with the transaction in such a way that:
  1. The finalizer is only performed if the STM transaction is guaranteed to commit.
  2. The STM transaction only commits if the finalizer finishes without raising an exception.


A motivation of why this is useful: 

Say we want to save the results of some STM transaction, i.e. persist them to disk. Currently, this is not possible in a transactionally safe way. The naive approach would be to first atomically perform some STM computation and then independently serialise its result:

    do 
        x <- atomically m
        serialize x

There are two issues with this. First, another thread could perform a second transaction and serialise it before we have finished serialising ours, which could result in an inconsistent state between the memory of our program and what is stored on disk. Secondly, the function serialize might not terminate at all; it could throw an exception or its thread could die. Again we would end up with an inconsistent state and possibly data loss.

We might try to simply move serialisation into the atomic block, but to do this we have to circumvent the types, which should already be a huge red flag:

    atomically $ do
        x <- m
        unsafeIOToSTM (serialize x)

The problem here is of course that an STM transaction can retry many times, which will also result in serialize being executed many times, which is probably not something we want. Furthermore, if the thread receives an asynchronous exception, the transaction will abort in an orderly fashion, while serialize, with its irrevocable side effects, cannot be undone.

But with finalizers, the solution is easy:

    atomicallyWithIO m serialize


I've written a small example application that uses finalizers and other stuff from my thesis to build a lightweight STM database framework: [2]

There are more possible uses cases besides serialisation (such as interactive transactions) and some interesting issues surrounding finalizers (like nesting of transactions) which are discussed in greater detail in my thesis [1], which also includes a formal operational semantics of the whole thing.

I have implemented finalizers in my fork of GHC [3], if any of you want to play around with them yourself. The atomicallyWithIO function is exported from the GHC.Conc.Sync module.


Cheers,
Michael

Roman Cheplyaka

unread,
Jul 30, 2015, 7:20:38 AM7/30/15
to haskel...@haskell.org
Michael,

This looks awesome! Will you try to merge your work into ghc?

I look forward to reading your thesis.

On 30/07/15 14:11, Michael Schröder wrote:
> Hi everyone,
>
> I want to finally announce some work I did as part of my master's thesis
> [1]. It's an extension to GHC's STM implementation. The main idea is to
> introduce a new primitive function
>
> atomicallyWithIO :: STM a -> (a -> IO b) -> IO b
>
> Like the existing atomically operation, atomicallyWithIO performs an STM
> computation. Additionally, it takes a *finalizer*, which is an arbitrary
> I/O action that can depend on the result of the STM computation, and
> combines it with the transaction in such a way that:
>
> 1. The finalizer is only performed if the STM transaction is guaranteed
> to commit.
> 2. The STM transaction only commits if the finalizer finishes without
> _______________________________________________
> Haskell-Cafe mailing list
> Haskel...@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>


signature.asc

Michael Schröder

unread,
Jul 30, 2015, 11:15:07 AM7/30/15
to haskel...@haskell.org
This looks awesome! Will you try to merge your work into ghc?

Thanks! Yes, I'd love to get this into GHC, assuming enough people are interested. I guess the next step would be to submit a patch to Phabricator?

M Farkas-Dyck

unread,
Jul 30, 2015, 11:21:16 AM7/30/15
to Michael Schröder, haskel...@haskell.org
On 30/07/2015 at 13:11:15 +0200, Michael Schröder wrote:
> atomicallyWithIO :: STM a -> (a -> IO b) -> IO b

Shiny!

On 30/07/2015 at 17:14:59 +0200, Michael Schröder wrote:
> assuming enough people are interested.

Not sure how many is enough, but count me interested ☺

Marcin Mrotek

unread,
Jul 30, 2015, 11:42:23 AM7/30/15
to M Farkas-Dyck, Haskell-Cafe
> assuming enough people are interested.

+1 from me, then.

Best regards,
Marcin Mrotek

Ben

unread,
Jul 30, 2015, 12:25:07 PM7/30/15
to Michael Schröder, haskel...@haskell.org
what happens if an exception is thrown? what are the retry semantics? is there a at-most-once guarantees for the IO action?

best regards, b

Michael Schröder

unread,
Jul 30, 2015, 5:19:34 PM7/30/15
to haskel...@haskell.org
what happens if an exception is thrown? 

In normal STM, if an exception occurs during a transaction, the transaction is gracefully aborted & none of its effects are made visible. Exactly the same thing is true in the presence of finalizers: if an exception occurs in the finalizer, the transaction is aborted & its effects discarded (you are still responsible for cleaning up any side effects that happened as part of the I/O action, of course).
 
what are the retry semantics? 

The finalizer is only run once the transaction is guaranteed to commit.

The STM part of the transaction might retry many times, but once the system determines there are no conflicts, it runs the finalizer. When the finalizer has finished, the transaction commits & makes its effects visible. If during the time the finalizer is running another transaction comes along and makes a conflicting write and tries to commit, the other transaction has to retry (in reality, the other transaction's thread is put to sleep and woken up once it actually has a chance to commit, i.e. once the finalizer of the first transaction has finished).
 
is there a at-most-once guarantees for the IO action?

Yes (see above).


A much more detailed description of all of this can be found in Sections 2.4 and 2.5 of my thesis [1], in which I give a complete formal semantics of STM with finalizers.

Ben

unread,
Jul 30, 2015, 5:31:59 PM7/30/15
to Michael Schröder, haskel...@haskell.org
looks very nice. thanks for the pointers.

best, ben

Felipe Lessa

unread,
Jul 30, 2015, 10:38:12 PM7/30/15
to haskel...@haskell.org
Em 30-07-2015 08:11, Michael Schröder escreveu:
> my master's thesis
[...]
> atomicallyWithIO :: STM a -> (a -> IO b) -> IO b

Isn't it beautiful? I wish mine could be summarized into a nice type
signature :).

Cheers,

--
Felipe.

signature.asc

David Turner

unread,
Jul 31, 2015, 4:14:44 AM7/31/15
to Michael Schröder, Haskell Cafe
Hi Michael,

Good work - what a lovely topic for your master's thesis! There's something a little bit odd about the real world being able to access the result of a transaction that hasn't yet (and may still yet not be) committed, but your arguments for why this is a good thing are compelling. It is also a bit odd that a STM transaction nested running within the finalizer won't see the results of the previous transaction as it's not yet committed. Not too surprising once you read your justification, but surprising beforehand.

A finalizer which has non-atomic real-world effects needs to be quite careful about undoing those effects when exceptions are thrown. For instance, looking at your example code[1]:

------------------------------------------------------------------------------

durably :: DatabaseHandle d -> TX d a -> IO a
durably h (TX m) = atomicallyWithIO action finalizer
  where
    action                 = runStateT m (database h, [])
    finalizer (a, (_,ops)) = serialize ops h >> return a

serialize :: [Operation d] -> DatabaseHandle d -> IO ()
serialize ops (DatabaseHandle _ h) =
    withMVar h (\h -> forM_ ops $ B.hPut h . runPut . safePut)

------------------------------------------------------------------------------


[1] https://github.com/mcschroeder/social-example/blob/8925056c/tx/TX.hs#L98

If some of those B.hPut calls succeed but then one fails (e.g. the disk is full) then the transaction will be rolled back, but the on-disk state will be left partially written. You're in good company with this problem, by the way - I've known large, expensive, commercial database products fail horribly and irretrievably in exactly this fashion when encountering a full disk at the wrong moment!

Even if the finalizer did include exception handling to deal with this situation, what happens with asynchronous exceptions? Does the finalizer run with async exceptions masked? I think it needs to: if not, then it seems it could run past the point where cleanup could occur and then receive an exception: essentially the finalizer runs to completion and then the STM transaction rolls back. Async exceptions get a brief mention in your thesis but I can't see anything about this point there - apologies if I've missed it.

Cheers,

David


Michael Schröder

unread,
Jul 31, 2015, 8:18:30 AM7/31/15
to Haskell Cafe
Hi David, I appreciate your interest in my thesis!

A finalizer which has non-atomic real-world effects needs to be quite careful about undoing those effects when exceptions are thrown. [...] If some of those B.hPut calls succeed but then one fails (e.g. the disk is full) then the transaction will be rolled back, but the on-disk state will be left partially written. 

Yes, you are absolutely right! The example application lacks some of the safeguards one would expect in a production-ready system. It was intended to be more of a demonstration of how easily one can in principle build a database-like system on top of STM using finalizers. It still requires some engineering effort to make it entirely safe and practical. I should have documented this better—or just gone the extra mile and actually made it safe!
 
Even if the finalizer did include exception handling to deal with this situation, what happens with asynchronous exceptions? Does the finalizer run with async exceptions masked? 

The finalizer does not run with async exceptions masked and you're right that one needs to be careful about how to deal with side-effects & async exceptions & cleanup within the finalizer—just like with any kind of I/O, really. In the TX example code, serialize should probably use withMVarMasked instead of withMVar. But I don't think the finalizer should run with async exceptions masked by default. There are uses cases where you really do want the finalizer to be interruptible from another thread, e.g. if you want to be able to timeout the whole transaction (STM part + finalizer part, as in: timeout n $ atomicallyWithIO ...)

David Turner

unread,
Jul 31, 2015, 9:07:18 AM7/31/15
to Michael Schröder, Haskell Cafe
Hi Michael,

I wholeheartedly agreed that timeout n $ atomicallyWithIO ... is important, but that would generally work even if the finalizer ran masked wouldn't it? Most actions that block are interruptible.

Put another way, I'd be interested to see how you could make serialize safe without this. Something like this:

serialize :: [Operation d] -> DatabaseHandle d -> IO ()
serialize ops (DatabaseHandle _ h) =
    withMVar h (\h -> forM_ ops $ B.hPut h . runPut . safePut)
  `onException`
    magicalFileRollback h

(ignoring the tedious implementation of magicalFileRollback) doesn't look quite right as if you got an async exception in the tiny window between onException exiting and serialize exiting then the STM transaction would roll back but the file wouldn't. I'm not convinced I totally understand async exceptions so if you (or anyone else) can point out why this works I'd be very grateful.

Cheers,

David




Michael Schröder

unread,
Jul 31, 2015, 2:12:33 PM7/31/15
to Haskell Cafe
Ah, of course you're right, I forgot that masking /= completely uninterruptible. And yes, it does seem like the finalizer needs to be masked by default, in order to prevent the kind of race condition you describe. Should be an easy change. Thanks for pointing that out!

Michael Schröder

unread,
Aug 3, 2015, 2:47:42 PM8/3/15
to Haskell Cafe
Async exceptions are now masked during the finalizer. Thanks again, David!

In addition to my GHC fork [1], I’ve now also made a patch file [2], which should apply cleanly on any recent version of HEAD. Apparently it’s rather tricky to initialise a GitHub-cloned GHC fork, as I’ve discovered. The patch file should make it easier to play around with the extension.


Cheers,
Michael

David Turner

unread,
Aug 4, 2015, 10:38:56 AM8/4/15
to Michael Schröder, Haskell Cafe
Ah, apologies, I missed this the first time around. Glad it helped.

Cheers,

David

Reply all
Reply to author
Forward
0 new messages