Clojure + Terracotta

Rich Hickey

unread,

Oct 18, 2008, 8:50:32 AM10/18/08

to clo...@googlegroups.com

On Fri, Oct 17, 2008 at 8:01 PM, Luc Prefontaine
<lprefo...@softaddicts.ca> wrote:
> I am not very far from tackling this issue. In our bus messaging system, we
> are using Terracotta with some Java components
> and it's a matter of weeks before we start to investigate how we can bridge
> Clojure and Terracotta.
>
> A customer asked us about some new functionality today and I see a need to
> fill the Terracotta/Clojure gap
> somehow.
>
> I'll comeback toward the end of November with some proposal.
>
> Any comments Rich about how you would see this integration and what Clojure
> semantics you would like to share through Terracotta ?
> I might enlarge the scope beyond what we need in our system even if not all
> the pieces are delivered in the very short term.
>

There are lots of potential synergies. I think that one key benefit of
using Clojure is that the immutable data structures can be shared, yet
read outside of locks. As you know, Terracotta requires shared objects
to be accessed under a lock. However, once the object has been
propagated, it need not be acceessed under a lock iff it is immutable.
This was one of the first things I verified with Terracotta.

So, for instance, you can do the normal Terracotta cache thing and put
Clojure's persistent data structures in a ConcurrentHashMap shared
root. Once you pull one out of the map, you can use it henceforth
without locking - a huge benefit IMO. Plus, since the data structures
share structure, updates are also efficient. A current hitch, which I
am looking to enhance anyway, is that some of the data structures do
lazy hash creation with volatile caches. In proc this is no problem,
nor out of proc since the hash value is a pure function of the
immutable structure value, but I think Terracotta may not be happy
with the volatile members. I have already on my todo list moving to
incrementally calculated hash values (had to wait for the unification
of hash logic with java.util's, now done).

Of course, this leaves you with ConcurrentHashMap's last-one-in-wins,
no coordinated activity semantics. When that's not good enough, you'll
want to try STM refs + transactions. Here too, I think a lot is
possible. As I've said, I once had this working, but haven't tracked
whether or not all mechanisms I am using are supported by Terracotta.
Underneath the STM are all standard Java concurrency things -
reentrant locks, wait/notify, java.util.concurrent.atomic stuff etc.
To the extent these are supported, it should function correctly right
away.

That said, there's one aspect of the STM I think could be tweaked for
a clustering situation. The only thing that is shared between all
transactions is a single CAS for the timestamps. In-proc, that works
fine until you get to very heavy micro-transactions, which are a bad
idea anyway. On a Terracotta cluster, that CAS will turn into a shared
lock, I think, which is much heavier. What you really want is a
central getAndIncrement server, since this capability can be done
completely with a server-side CAS with no client coordination.
Terracotta, being API-free, will want to maintain the illusion that
each JVM has its own CAS. If I had time to do Terracotta/Clojure work,
I'd probably investigate abstracting out the STM's timestamp generator
to allow for a timestamp server for clustered situations.

Once you have that, you can make normal locality-of-reference based
architectural decisions for the refs, and get concurrency that is
scalable strictly on the transactions' degree of ref overlap.

Rich

Luc Prefontaine

unread,

Oct 18, 2008, 10:55:35 PM10/18/08

to clo...@googlegroups.com

Ok, I'll digest this in the next couple of weeks and have a look more closely
at Clojure code and Terracotta's capabilities.

I have a non-linear thinking process so it helps to have some early goals.
I'll create some prototype setup here to help identify issues and run
some basic tests.

Thank you,

Luc

Alex Miller

unread,

Oct 19, 2008, 8:07:08 PM10/19/08

to Clojure

Rich, I'm the tech lead for the transparency team at Terracotta and
this is not exactly correct. For example, while you can read
clustered state outside of a clustered lock, it's possible for the tc
memory manager to clear that state at any time, allowing you to see a
null instead of the real value. Generally, if you're not reading
under at least a read lock, you can see nulls.

Now, all is not lost because an immutable data structure requires only
a read lock and read locks can be checked out to all the nodes in the
cluster and used locally as a write will never occur. Read locks like
that will be served locally without crossing the network and will not
generate tc transactions. So, this will be a really cheap lock to get
even in the cluster.

Maps (especially CHM) are also a fairly specialized case as we heavily
instrument and modify the internal behavior. Map impls like HashMap,
Hashtable, and CHM are all "partial" in tc, meaning that the keys must
always be in all nodes but values are faulted lazily as needed.

Currently, volatiles are not really treated any differently than other
non-volatile fields - we require that they be accessed under a
clustered lock. The whole purpose of volatile isn't really possible
with Terracotta so we can't really do anything special. We do have
some internal support for automatically wrapping volatiles in
clustered locks but haven't surfaced that out to our config file
yet.

You're correct on the CAS - in general the stuff that really makes
single VM concurrency fly (like fine-grained locks with lightweight
impls like CAS) are awful for clustering. For things like sequences,
it's generally better to use checked out blocks of ids. We've got an
implementation of this if you want it (it's pretty straightforward).
If that's not sufficient, I'm sure we could come up with some other
solution.

I have a keen interest in Clojure (awesome stuff Rich) and obviously
also Terracotta. If only I had another 5 hrs/day, I'd spend it on
Clojure/TC integration. :) I (and others at Terracotta) are
certainly happy to answer questions and help where we can. Probably
best to ask on our mailing lists as everyone in eng monitors the user
and dev lists:

http://www.terracotta.org/confluence/display/wiki/Mailing+Lists

Alex Miller

Alex Miller

unread,

Oct 20, 2008, 10:12:11 AM10/20/08

to Clojure

Sorry, I was not thinking straight when I wrote part of this. We will
protect you from seeing nulls by faulting values back in, even if not
read under a lock. So, what you said originally there is true. There
is a possibility that you can get dirty reads in this scenario but in
the case of an immutable data structure that won't be an issue. Sorry
for the confusion...

Alex

Rich Hickey

unread,

Oct 20, 2008, 11:12:09 AM10/20/08

to Clojure

On Oct 20, 10:12 am, Alex Miller <alexdmil...@yahoo.com> wrote:
> Sorry, I was not thinking straight when I wrote part of this. We will
> protect you from seeing nulls by faulting values back in, even if not
> read under a lock. So, what you said originally there is true. There
> is a possibility that you can get dirty reads in this scenario but in
> the case of an immutable data structure that won't be an issue. Sorry
> for the confusion...
>

Thanks for the clarification - that's great news. I remember that
being the answer when I first researched Terracotta.

Rich

Reply all

Reply to author

Forward