ANN: Durable refs with ACID guarantees - Phase I

178 views
Skip to first unread message

Alyssa Kwan

unread,
Sep 22, 2010, 11:21:47 PM9/22/10
to Clojure
This is in reference to http://groups.google.com/group/clojure/browse_frm/thread/d6deac0c34d0ce28/5c1e11ec2bd52bde.

Hi everyone!

I have hacked the Clojure core to add durability to refs. The syntax
to create these is (dref <val> <key> <path>), where <key> and <path>
are strings. Then you use them just like refs. Creating a dref
creates a global identity, such that subsequent dref calls to the same
key and path will get the same dref. On subsequent dref calls, the
<val> will be ignored and the persisted value used. This includes in
subsequent VM instances.

Get it here: git://github.com/kwanalyssa/clojure.git

1. <path> refers to BDB JE databases which get created in the "data"
directory of your project.
2. BDB JE is used. I'm ignorant of IP and licensing issues. Making
BDB JE core to Clojure is probably an issue.
3. Currently only a subset of Clojure primitives types is supported.
No BigDecimal or Ratio yet. See the comprehensive list (and
serialization mappings) at the bottom of src/jvm/clojure/lang/
DRef.java. BDB JE TupleBindings are used. Submissions welcome,
especially for the persistent data structures.
4. How do we approach the problem of storing objects with lexical
environments?
5. Unit tests welcome! I didn't do TDD since the work is in Java and
there's no Java-level tests in the project. Please add your own in
test/clojure/test_clojure/drefs.clj. I'm new to concurrency, so tests
along those lines would be awesome!
6. The ACID part is not really guaranteed!!! STM is currently one-
phase-commit. I inserted two-phase-commits to the data stores in the
middle of the one-phase-commit. There's the remote possibility that
STM in-memory changes fail AFTER writing to disk. It's REALLY remote,
but it is possible. STM would have to be made 2PC to make this
airtight. That's way beyond my current grasp of both concurrency and
Clojure implementation.
7. I was aiming for an API where <path> is optional. However, I
didn't want to stray from the ref API, which has variable arity.
Suggestions on how to reconcile the two are welcome!
8. To maintain global identity, I use a static cache, which requires
non-hard references to avoid OOM issues. This is my first time doing
this, so please check my code to make sure that I'm doing it right.
I'm using SoftReferences, though WeakReferences may be better for real-
life usage patterns. Let me know!

Please dig in! Feedback appreciated!
Alyssa Kwan

Per Vognsen

unread,
Sep 23, 2010, 8:27:54 AM9/23/10
to clo...@googlegroups.com
Cool!

I'm getting back to Clojure after an extended absence. Just today I
was pondering the design of a solution to a similar problem, though I
suspect our requirements diverge on several points. My tentative
conclusion was that it could be done entirely in Clojure and without
modifying existing code. Maybe you can poke holes in my fledging plan
since you've obviously been thinking about this sort of problem longer
than me:

There's a new pref reference type. It consists of a key and an atom
containing nil for unloaded objects and an STM reference for loaded
objects.

When a pref is dereferenced, it checks its atom. If nil, it first
loads the object from disk into a fresh STM reference (which has a
metadata field pointing back to the pref) and mutates the atom so it
points to it. In either case it finishes by dereferencing the STM
reference.

When a pref is mutated, it first goes through the same motions as for
dereferencing. Then it simply forwards the mutation to the underlying
STM reference.

Watchers are installed on STM references backed by prefs. Thus we are
notified when something is mutated.

There is a pref-specific transaction boundary form called atomic,
analogous to dosync. The watchers are used to determine which prefs
were mutated during the transaction so as to flag them dirty for
write-back or write-through caching; this is why we need the pref back
reference in the metadata.

Anyway, even assuming this all works, it will obviously be less
computationally efficient than extending LockingTransaction.java with
special support.

-Per

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

Alyssa Kwan

unread,
Sep 23, 2010, 9:16:00 AM9/23/10
to Clojure
It's probably possible to do it completely in Clojure, but you have to
subclass Atom. There's no need for any transaction boundary; you just
have to make sure that compareAndSet does a durable swap.

My plan was to get durable refs done and then extend the other mutable
identities, including atom. I'd love to work with you on it!
> On Thu, Sep 23, 2010 at 10:21 AM, Alyssa Kwan <alyssa.c.k...@gmail.com> wrote:
> > This is in reference tohttp://groups.google.com/group/clojure/browse_frm/thread/d6deac0c34d0....

Dragan Djuric

unread,
Sep 23, 2010, 9:09:15 PM9/23/10
to Clojure
What is the performance penalty?

On Sep 23, 5:21 am, Alyssa Kwan <alyssa.c.k...@gmail.com> wrote:
> This is in reference tohttp://groups.google.com/group/clojure/browse_frm/thread/d6deac0c34d0....
>

Per Vognsen

unread,
Sep 23, 2010, 11:05:05 PM9/23/10
to clo...@googlegroups.com
On Thu, Sep 23, 2010 at 8:16 PM, Alyssa Kwan <alyssa...@gmail.com> wrote:
> There's no need for any transaction boundary; you just
> have to make sure that compareAndSet does a durable swap.

I had the chance to read your code today. You have a transaction
boundary in DRef.set() which is called by LockingTransaction.run() at
commit time. My point was that if you weren't intrusively modifying
LockingTransaction.java you would need to take care of that somewhere
else, the most obvious place being a dosync wrapper form. All you
would need is a seq of 'vals' returned on a commited run(). That would
also be useful for application-side transaction logging, etc.

-Per

Alyssa Kwan

unread,
Sep 23, 2010, 11:14:20 PM9/23/10
to Clojure
I haven't benchmarked... I don't have much experience with
benchmarking. Assistance would be greatly appreciated!

Alyssa Kwan

unread,
Sep 23, 2010, 11:40:42 PM9/23/10
to Clojure
Ah. I thought we were discussing prefs, or datoms (durable atoms), as
I would call them. Because datoms are only synchronous but not
coordinated, there's no transaction boundary. (More accurately, the
swap! is the transaction boundary, much like auto-commit.) dosync has
no effect on datoms.

drefs, being coordinated, do require a transaction boundary. However,
I don't think it's possible without modifying LockingTransaction.
It's bad enough that the current implementation has 2PC against ACID
resources wrapped inside of a 1PC STM transaction. To place the
durable write outside of the 1PC would be much less safe. dosync
enforces a global transaction order. If writes were outside
LockingTransaction.run(), the order could (and probably would) be
different between in-memory resources and durable resources. For
ultimate safety, we need to be even more intrusive and add a prepare
phase to the STM.

On Sep 23, 11:05 pm, Per Vognsen <per.vogn...@gmail.com> wrote:

Per Vognsen

unread,
Sep 23, 2010, 11:52:33 PM9/23/10
to clo...@googlegroups.com
This probably comes back to divergent requirements. Strict durability
is much too expensive for what I need to do. For me the more important
thing is that whatever authoritative data lives on disk is consistent
with the application transaction boundaries. This means that I need to
tag persistent refs as dirty or increment a version number when an STM
transaction commits, so that when they are evicted from cache or a
consistent snapshot is written to disk, I know what to write out.
Another simplifying requirement is that I don't have to worry about
different database domains for persistent refs. The application is
organized around a single database that serves as a persistent store
for application domain data.

-Per

Reply all
Reply to author
Forward
0 new messages