RFC: clj-ds Clojure data structure for Java (et al.)

39 views
Skip to first unread message

Krukow

unread,
Jun 8, 2010, 8:34:34 AM6/8/10
to Clojure
I would like to hear the groups opinion before (and if) I release this
to the general public.

http://github.com/krukow/clj-ds

README:
...
*WHY*
First, I love Clojure :) ...
Unfortunately sometimes clients require that I use Java...

The data structures used in the Clojure programming language are a
great
implementation of a number of useful persistent data structures
(see e.g., the section on immutable data structures on
http://clojure.org/functional_programming).

However, these data structures are tailor-made to work optimally in
the
Clojure programming language, and while it is possible to use these
from
Java (since Clojure is distributed as a .jar file), it is inconvenient
for a number of reasons (see below). Since it is (unfortunately) not
always
possible to use Clojure, I've created this library to at least reap
some of the
Clojure benefits in environments constrained to Java.
Beyond Java, other JVM languages like Erjang, Scala, JRuby and Groovy
may benefit
from immutability & persistence, and from this implementation.

*Advantages of clj-ds when constrained to working with Java*
Currently the Clojure data structures are implemented in Java. In the
future,
all of Clojure will be implemented in Clojure itself (known as
"Clojure-in-Clojure").
This has many advantages for Clojure, but when it happens the data
structures will
probably be even more intertwined with the rest of the language,
and may be even more inconvenient to use in a Java context.

The clj-ds project will maintain Java versions of the code, and where
possible attempt
to "port" improvements made in the Clojure versions back into clj-ds.
Thus keeping maintained
versions of the Java data structures.

In the current Clojure version, calling certain methods on
PersistentHashMap requires
loading the entire Clojure runtime, including the bootstrap process.
This takes about one second.
This means that the first time one of these methods is called, a Java
user will experience a
slight delay (and a memory-usage increase). Further, many of the
Clojure runtime
Java classes are not needed when only support for persistent data
structures
is wanted (e.g., the compiler).

The clj-ds library is not dependent on the Clojure runtime nor does it
run any
Clojure bootstrap process, e.g., the classes that deal with
compilation have been removed.
This results in a smaller library, and the mentioned delay does not
occur.

Clojure is a dynamically typed language. Java is statically typed, and
supports
'generics' from version 5. A Java user would expect generics support
from a Java
data structure library, and the Clojure version doesn't have this.
clj-ds will support generics.

Finally, a slight improvement.
Certain of the Clojure data structure methods use Clojure's 'seq'
abstraction
in the implementation of the Java 'iterator' pattern. It is possible,
to make
slightly more efficient iterators using a tailor made iterator. clj-ds
does this.

Example stats for iterating through various data structures:
(20-40% improvement, matters only for quite large structures)

PersistentHashSet:
----
500.000 elements
58 ms (Java avg.)
192 ms (Pure Clojure avg)
106 ms (clj-ds avg)
---
1 mio elements:
104 ms (Java avg.)
497 ms (Pure Clojure avg)
371 ms (clj-ds avg)

---
PersistentHashMap
---
500.000 elements
94 ms (Java avg.)
189 ms (Pure Clojure avg)
131 ms (clj-ds avg)

1 mio elements:
128 ms (Java avg.)
549 ms (Pure Clojure avg)
394 ms (clj-ds avg)

---
PersistentVector
---
1 mio elements:
104 ms (Java avg.)
122 ms (Pure Clojure avg)
104 ms (clj-ds avg)

2 mio elements:
186 ms (Java avg.)
223 ms (Pure Clojure avg)
184 ms (clj-ds avg)

B Smith-Mannschott

unread,
Jun 8, 2010, 9:11:12 AM6/8/10
to clo...@googlegroups.com
On Tue, Jun 8, 2010 at 14:34, Krukow <karl....@gmail.com> wrote:
> I would like to hear the groups opinion before (and if) I release this
> to the general public.
>
> http://github.com/krukow/clj-ds
>
> README:
> ...
> *WHY*
> First, I love Clojure :) ...
> Unfortunately sometimes clients require that I use Java...
>
> The data structures used in the Clojure programming language are a
> great
> implementation of a number of useful persistent data structures
> (see e.g., the section on immutable data structures on
> http://clojure.org/functional_programming).

...

Yes please!

One nit: you probably didn't mean to check in your build products. The
classes directory is full of *.class files.

Krukow

unread,
Jun 8, 2010, 10:08:11 AM6/8/10
to Clojure


On Jun 8, 3:11 pm, B Smith-Mannschott <bsmith.o...@gmail.com> wrote:
> Yes please!

OK :)

> One nit: you probably didn't mean to check in your build products. The
> classes directory is full of *.class files.


No, I didn't want the class files - I was a bit fast on the commit.
I'll clean it up.

/Karl

Jason Smith

unread,
Jun 8, 2010, 5:32:27 PM6/8/10
to Clojure
I really like the idea of modularizing the language. Not just the
Java parts, since Clojure compiles to class files as well. The
clojure.jar / clojure-contrib.jar split is artificial, and the bigger
these libraries get, the more obvious it will become that they need to
be broken into multiple separate projects.

Mike Anderson

unread,
Jun 11, 2010, 9:59:00 AM6/11/10
to Clojure
On Jun 8, 1:34 pm, Krukow <karl.kru...@gmail.com> wrote:
> I would like to hear the groups opinion before (and if) I release this
> to the general public.
>
> http://github.com/krukow/clj-ds

I really like this approach.

Not sure if it's any use, but I created a data structure library of my
own in Java which may have some useful code you can borrow (you are
free to use anything you like to include in this project).

http://code.google.com/p/mikeralib/source/browse/#svn/trunk/Mikera/src/mikera/persistent

In particular, there is a persistent String class
(mikera.persistent.Text) which I always thought would be a good fit
with Clojure.....
Reply all
Reply to author
Forward
0 new messages