Clojure only (in-memory) database / locking granularity

Rowdy Rednose

unread,

Jun 30, 2009, 8:37:54 AM6/30/09

to Clojure

Would it be easy to implement an in-memory database in clojure that
allows concurrent access?

It sounds pretty easy, as most of the features are already provided by
Clojure. I'm not sure though about the "locking granularity".
Of course you don't pessimistically lock in Clojure, but if you have a
relation that is "protected" by a Ref, then concurrently adding two
rows to that relation will make one of the two transaction replay,
although the added rows could be completely independent (unless there
are unique constraints, for example, which, now that I think about it,
are present in every real-world table in the form of primary keys).

Furthermore, if you have constraints that involve other relations, say
foreign key constraints, then the locking granularity gets even
coarser, spanning all relations involved, and thus affecting every
mutating access to any one of those relations.

Is there a way to make this more fine-grained and still use the
existing clojure data structures? Or is this for some reason not a
problem in reality?

Has anyone actually implemented something like this in Clojure? You'd
think that many companies have a database that contains their clients
in relations like "Person", "Company", etc.

Chouser

unread,

Jun 30, 2009, 11:02:10 AM6/30/09

to clo...@googlegroups.com

On Tue, Jun 30, 2009 at 8:37 AM, Rowdy Rednose<rowdy....@gmx.net> wrote:
>
> Would it be easy to implement an in-memory database in clojure that
> allows concurrent access?
>
> It sounds pretty easy, as most of the features are already provided by
> Clojure. I'm not sure though about the "locking granularity".
> Of course you don't pessimistically lock in Clojure, but if you have a
> relation that is "protected" by a Ref, then concurrently adding two
> rows to that relation will make one of the two transaction replay,
> although the added rows could be completely independent (unless there
> are unique constraints, for example, which, now that I think about it,
> are present in every real-world table in the form of primary keys).

You could wrap a Ref around every value, if you chose, to
allow independent changes to different "rows" at the same
time -- though this would not help when inserting rows.

You could partition your key space somehow (perhaps by hash)
to allow n groups of rows, and protect each of those with
a Ref. This would allow insertions and deletions
independently as long as the operations were being applied
to different groups.

I imagine there are other possible solutions as well...

> Furthermore, if you have constraints that involve other relations, say
> foreign key constraints, then the locking granularity gets even
> coarser, spanning all relations involved, and thus affecting every
> mutating access to any one of those relations.

With either of the layouts I mentioned, this kind of
multi-relation transaction ought to allow Clojure STM to
really shine when compared to a more naive locking scheme.

--Chouser

Rowdy Rednose

unread,

Jul 1, 2009, 8:02:54 AM7/1/09

to Clojure

On Jul 1, 12:02 am, Chouser <chou...@gmail.com> wrote:
> You could wrap a Ref around every value, if you chose, to
> allow independent changes to different "rows" at the same
> time -- though this would not help when inserting rows.

I guess having millions of Refs would not perform too well. Plus you
don't need that level of concurrency any time soon, I guess (and
hope).

> You could partition your key space somehow (perhaps by hash)
> to allow n groups of rows, and protect each of those with
> a Ref. This would allow insertions and deletions
> independently as long as the operations were being applied
> to different groups.

Sounds feasible and reminds me of the lock striping done in
ConcurrentHashMap. It would allow me to use a number of refs that
depends on the number of available processors.
One thing that I wonder is whether this data structure should be
written in Java or in Clojure.

Is there any other database implemented in clojure besides
contrib.datalog? contrib.datalog doesn't have (write) concurrency and
it looks like that is something that cannot be added without changing
the interface.

I'm basically looking for a database that allows the (clojure) user to
listen on database updates that match certain criteria, just like the
"observers" in the "Functional Relational Model" paper by Moseley and
Marks.

But it looks like I have to implement that myself - which is not a
complaint, but I'm trying to estimate the amount of work necessary.
But I guess I could just start with one Ref per relation and then make
it more concurrent later - if I choose an interface that allows it.

Matt Culbreth

unread,

Jul 1, 2009, 8:18:01 AM7/1/09

to Clojure

On Jul 1, 8:02 am, Rowdy Rednose <rowdy.redn...@gmx.net> wrote:

>
> But it looks like I have to implement that myself - which is not a
> complaint, but I'm trying to estimate the amount of work necessary.
> But I guess I could just start with one Ref per relation and then make
> it more concurrent later - if I choose an interface that allows it.

This looks like a cool project--please keep us updated if you actually
begin it. I'm sure others wouldn't mind helping.

Rowdy Rednose

unread,

Jul 1, 2009, 9:08:22 AM7/1/09

to Clojure

I'm just trying to once and for all find or implement a clojure
solution for the simple problem that most companies (that make heaps
of money developing super boring software) have: There's relational
data like "Persons" and "Companies", that needs to be accessed and
manipulated by multiple client programs concurrently.

But even today, where everyone working on that kind of problem should
have seen frameworks like Ruby on Rails, there exist (especially in my
current job) horrible homegrown "frameworks", where adding a simple
new "business object" or making changes to an existing one is a
serious pain in the neck.
I've implemented different solutions to that problem myself for
different companies using standard sql databases and languages like
Java. None of these solutions was as bad as what I currently have to
put up with, but none was perfect either and each one had their
downside.

Think about (de)serialization, caching, high availability (failover
primary/backup) and so forth - all of these layers are hand-written
with custom code for every single business object in my current
company. And if this is the core of the system, you can imagine how
all the code that uses it looks like. It's inconsistent and very
brittle (concurrency).

That said, I have already implemented an in-memory database system in
clojure which is consistent (constraints) and allows to you to listen
for modifications, but it's my first attempt of something like this in
Clojure and therefore highly un-idiomatic, not clean and well-
structured (namespaces) and for all those reasons hard to maintain.

But now that I have something that actually works, I plan to
completely rewrite it from scratch in a cleaner way. But I want to get
some things clear first to see whether I'm going the right way.

One other thing that I want to do in that rewrite is define (or reuse)
languages for querying, manipulating and defining the schema and
constraints. Currently for example I accept any clojure function with
one argument for the check constraints. I want to restrict that to a
well-defined set of functions/operations that can be arbitrarily
combined.
One reason I want to this is so that I can persist all modifications
to disk (and thus make it durable), in a descriptive format that is
clojure-independent.
If these function can be arbitrary clojure code god knows what
problems I'm inviting - non-pure code, making the persistence files
depend on a certain version of clojure and so forth.

But if in the meantime someone pops up in the group saying that they
implemented a system similar to the one described in "Functional
Relational Programming", I'll probably immediately dump my code and
just use that... :)

Eric Willigers

unread,

Jul 1, 2009, 10:41:48 PM7/1/09

to Clojure

On Jul 1, 1:02 am, Chouser <chou...@gmail.com> wrote:

I think the implementation of "ensure" will need to change for this
use case to shine -
http://www.mail-archive.com/clo...@googlegroups.com/msg14915.html

Reply all

Reply to author

Forward