Serializing Clojure objects

1,370 views
Skip to first unread message

Tayssir John Gabbour

unread,
Dec 2, 2008, 3:57:13 AM12/2/08
to Clojure
Hi!

How should I approach serialization? I made a little test function
which serializes and deserializes Clojure objects. It works for
strings, integers, symbols, LazilyPersistentVectors and.. oddly..
PersistentHashMaps that have exactly one element. (My Clojure is about
a month old.)

But for other things, like keywords and most PersistentHashMaps, it
throws NotSerializableException.

My imagined possible solutions:

* Implement Serializable for Clojure data -- but is it possible in a
dynamic "Hey I'll just write a new method!" way?

* Go into Clojure's source and implement Serializable to the Java
classes.


My end goal is using a nonrelational DB like Tokyo Cabinet or
BerkeleyDB.

Thanks,
Tayssir


PS: Here's my test code:

(defn my-identity "Copies obj through serialization and
deserialization."
[obj]
(let [byte-out (new java.io.ByteArrayOutputStream)
obj-out (new java.io.ObjectOutputStream byte-out)]
(try (.writeObject obj-out obj)
(finally (.close obj-out)))
(let [obj-in (new java.io.ObjectInputStream
(new java.io.ByteArrayInputStream (.toByteArray
byte-out)))]
(try (.readObject obj-in)
(finally (.close obj-in))))))


Tayssir John Gabbour

unread,
Dec 2, 2008, 5:04:55 AM12/2/08
to Clojure
On Dec 2, 9:57 am, Tayssir John Gabbour <tayssir.j...@googlemail.com>
wrote:
> (defn my-identity "Copies obj through serialization and
> deserialization."
> [obj]
> (let [byte-out (new java.io.ByteArrayOutputStream)
> obj-out (new java.io.ObjectOutputStream byte-out)]
> (try (.writeObject obj-out obj)
> (finally (.close obj-out)))
> (let [obj-in (new java.io.ObjectInputStream
> (new java.io.ByteArrayInputStream (.toByteArray
> byte-out)))]
> (try (.readObject obj-in)
> (finally (.close obj-in))))))

BTW, sorry for the absurd code; I should .close() byte-out instead of
obj-out.

And I'll have to figure out how to readably close a stream in a
finally clause.


Tayssir

Luc Prefontaine

unread,
Dec 2, 2008, 5:12:35 AM12/2/08
to clo...@googlegroups.com
I use YAML to serialize. I needed a simple way to pass Maps, Lists and Vector between java,  Ruby and Clojure components.
I change the classes of Clojure in the YAML output to java.util.Map, List and so on to remove dependencies
on Clojure classes while retaining the ability to walk through the structures using these basic types in "foreign"
components. In Java it's pretty obvious, Map, List and Vectors are always present and in Ruby these things are
also part of the core language.

Have a look at http://jyaml.sourceforge.net/

Essentially it sums up to something like this:

(def *YAML-config* (YamlConfig.))
(. *YAML-config* load yamlmsg) ;; Loads a YAML representation to an equivalent object representation
(. *YAML-config* dump  msg) ;; Dumps an object to a YAML string.

I extended a bit the library to deal transparently with types like java.sql.Date (I deal with several databases)
but nothing else was changed. Just beware of binary characters in your strings. I encoded these with XML/HTML escapes before serializing.
I need to talk to the maintainer about this issue.

Never liked Java serialization mainly because:

a) The the binary representation of classes has to be exactly the same at both ends otherwise you are stuck in a dead end.

b) You need that !@!@%$%@#$%$ Serializable interface which should be implemented by default everywhere by Java, not you.
    An embedded object misses the interface ? Well find it.. at run-time and good luck.

c) It's not easy to debug since it's not human readable.

d) It makes upgrading a distributed environment a pain in the ass since you may have upgrade everything even if no major
    changes occurred in your classes. You added a method irrelevant to most of the components in a class ?
    That single change forces you to upgrade everything... this is a typical example of developpers disconnected from real life.
    In real life your systems are running and you may not be able to interrupt services for a long period to upgrade them
    all at once. You may have to do so in multiple steps and without interrupting the service.

e) I want the data serialized, not the access to it...

If size of the YAML output becomes an issue then zip it.

Luc

Tayssir John Gabbour

unread,
Dec 2, 2008, 5:36:19 AM12/2/08
to Clojure
Interesting, thanks for the new perspective! Using YAML seems more
flexible than what I was thinking, particularly since Clojure
apparently doesn't make me worry too much about the specific kind of
sequence/map I'm using.

(Yeah, I have an app which sends serialized objects all over the
place, and one thing I didn't like was how brittle my serialization
tool was when I made a little change to object definitions. (This is
in Common Lisp.) In my next release, I'd like to make it less strict.)


Tayssir


On Dec 2, 11:12 am, Luc Prefontaine <lprefonta...@softaddicts.ca>
wrote:
> I use YAML to serialize. I needed a simple way to pass Maps, Lists and
> Vector between java, Ruby and Clojure components.
> I change the classes of Clojure in the YAML output to java.util.Map,
> List and so on to remove dependencies
> on Clojure classes while retaining the ability to walk through the
> structures using these basic types in "foreign"
> components. In Java it's pretty obvious, Map, List and Vectors are
> always present and in Ruby these things are
> also part of the core language.
>
> Have a look athttp://jyaml.sourceforge.net/

Parth Malwankar

unread,
Dec 2, 2008, 7:02:58 AM12/2/08
to Clojure


Tayssir John Gabbour wrote:
> Hi!
>
> How should I approach serialization? I made a little test function
> which serializes and deserializes Clojure objects. It works for
> strings, integers, symbols, LazilyPersistentVectors and.. oddly..
> PersistentHashMaps that have exactly one element. (My Clojure is about
> a month old.)
>

I am not much of a Java guy so this would be what I would do.
Clojure has reader syntax of its data structures (maps, vectors etc.)
so they can be easily written to a stream as test using write.

Reading it is equally simple:

user=> (def a (with-in-str "{:a 1 :b 2}" (read)))
#'user/a
user=> a
{:a 1, :b 2}
user=>

So as long as your data has reader syntax its not too much
of an issue. If the data needs to be shared between languages
as highlighted by someone, you may consider using another
format like json or so.

If the data used java object it may not be serializable so
easily. Generally my approach is to stick to data structures
at Clojure level as it has reader syntax.

Parth

Dakshinamurthy Karra

unread,
Dec 2, 2008, 7:10:07 AM12/2/08
to clo...@googlegroups.com
Don't forget XMLEncoder/XMLDecoder. They come in pretty handy when you
want to serialize objects that (already) follow bean conventions.

-- KD

Dakshinamurthy Karra
(blog: http://blog.marathontesting.com)
(daily dose: http://twitter.com/marathontesting)

Tayssir John Gabbour

unread,
Dec 2, 2008, 11:33:05 AM12/2/08
to Clojure
JBossSerialization looks nifty, though I haven't tried it yet:
http://www.jboss.org/serialization/

Thanks to everyone who responded! (I've just been immersing myself in
Externalizable, object versioning, etc; and your thoughts have been
helpful.)


All best,
Tayssir


On Dec 2, 9:57 am, Tayssir John Gabbour <tayssir.j...@googlemail.com>
wrote:

Jeff Rose

unread,
Dec 2, 2008, 12:08:04 PM12/2/08
to clo...@googlegroups.com
I've been working on the same issue. So far it has mostly been just
researching various options, but I can give you my two cents...

It really depends on your goals and constraints. I have narrowed down
to two major families of serialization for storage and networking. One
is the JSON/YAML/XML style, where you generate a serialized version of
data structures primarily based on vectors and hashes that contain only
simple data types. (Note, JSON is a subset of YAML, so you can parse
JSON with YAML but not vice versa.) This is by far the fastest to
develop and the most light weight in terms of programmer time.
Basically one line each for read/write. The potential hidden cost
depends on what data structures you use in your program. If you have
clearly defined chunks of data to serialize, YAML works nicely, but for
more complex structures you often have to do an intermediate conversion
to simpler data structures where you deal by hand with things like
circular references and pointers to ephemeral data that you don't want
serialized.

The previous options are however, inefficient for storage, transmission
and parsing in comparison to a more strictly defined protocol. If you
need raw performance and you are willing to spend the effort defining
your protocol, then I think something like the Google protocol buffers
or Facebook thrift are good options. They are basically the new-school
versions of CORBA RPC. In essence, you define a schema for your
messages or data serialization units, and then some tools generate
classes or functions that are used to read/write and transmit this
data. (SOAP pretty much works the same way, but it idiotically sits on
XML too, so you get the worst of both worlds...) Again, if your data
units to be serialized are self contained this can work pretty smoothly,
but in more complex structures you will also have to convert between the
simple, generated classes and your more complex application classes.
The real work though, is in creating and maintaining your protocol
definitions and the code that uses the generated classes.

I think the default for a language like clojure should be YAML too. For
dynamic languages where developer time is the focus it is by far the
quickest mechanism to get up and running using databases, configuration
files, networking, etc. Maybe we should look into integrating the
built-in Clojure data-types with a YAML library, or otherwise creating a
new one, so we can dump and load directly between serialized strings and
Clojure data structures.

If you run up against the limits of YAML, then I would go protocol
buffers. They seem like a clean and efficient way to support
multi-language communication without wasting time writing a bunch of
custom serialization methods. It would be interesting if there was a
way to sort of generate .proto files by example, by sniffing YAML on the
wire or something... It could at least help bootstrap the protocol
definition phase.

Hopefully that helps.

-Jeff

Rich Hickey

unread,
Dec 2, 2008, 1:52:44 PM12/2/08
to Clojure


On Dec 2, 7:02 am, Parth Malwankar <parth.malwan...@gmail.com> wrote:
> Tayssir John Gabbour wrote:
> > Hi!
>
> > How should I approach serialization? I made a little test function
> > which serializes and deserializes Clojure objects. It works for
> > strings, integers, symbols, LazilyPersistentVectors and.. oddly..
> > PersistentHashMaps that have exactly one element. (My Clojure is about
> > a month old.)
>
> I am not much of a Java guy so this would be what I would do.
> Clojure has reader syntax of its data structures (maps, vectors etc.)
> so they can be easily written to a stream as test using write.
>
> Reading it is equally simple:
>
> user=> (def a (with-in-str "{:a 1 :b 2}" (read)))
> #'user/a
> user=> a
> {:a 1, :b 2}
> user=>
>
> So as long as your data has reader syntax its not too much
> of an issue. If the data needs to be shared between languages
> as highlighted by someone, you may consider using another
> format like json or so.
>
> If the data used java object it may not be serializable so
> easily. Generally my approach is to stick to data structures
> at Clojure level as it has reader syntax.
>

Yes, please consider print/read. It is readable text, works with a lot
of data structures, and is extensible.

As part of AOT I needed to enhance print/read to store constants of
many kinds, and restore faithfully. This led to a new multimethod -
print-dup, for high-fidelity printing. You can get print-dup behavior
by binding *print-dup*:

(binding [*print-dup* true]
(dorun
(map prn
[[1 2 3]
{4 5 6 7}
(java.util.ArrayList. [8 9])
String
"string"
42M
:hello
#"ethel"
(sorted-set 9 8 7 6)
#'rest])))

[1 2 3]
{4 5, 6 7}
#=(java.util.ArrayList. [8 9])
#=java.lang.String
"string"
42M
:hello
#"ethel"
#=(clojure.lang.PersistentTreeSet/create [6 7 8 9])
#=(var clojure.core/rest)

It can handle all of the Clojure data structures (including sorted
variants), Java collections, classes etc.

You can extend it to new types by defining the print-dup method for
the type.

Rich

Parth Malwankar

unread,
Dec 2, 2008, 2:21:12 PM12/2/08
to Clojure


On Dec 2, 11:52 pm, Rich Hickey <richhic...@gmail.com> wrote:
> As part of AOT I needed to enhance print/read to store constants of
> many kinds, and restore faithfully. This led to a new multimethod -
> print-dup, for high-fidelity printing. You can get print-dup behavior
> by binding *print-dup*:
>
...

>
> It can handle all of the Clojure data structures (including sorted
> variants), Java collections, classes etc.
>
> You can extend it to new types by defining the print-dup method for
> the type.
>
> Rich

Very cool. I did not know much about print-dup and the support
for java collections. Thanks.

Parth

Mark Volkmann

unread,
Dec 2, 2008, 3:52:22 PM12/2/08
to clo...@googlegroups.com

I don't understand what the print-dup mechanism outputs and how to
reconstruct objects from that output later. I was expecting an API
similar to this.

(def my-string (print-dup [1 2 3]))
(def my-data (read my-string))

Can you give a simple example of serializing and deserializing a
Clojure collection?

--
R. Mark Volkmann
Object Computing, Inc.

Chouser

unread,
Dec 2, 2008, 6:38:59 PM12/2/08
to clo...@googlegroups.com
On Tue, Dec 2, 2008 at 3:52 PM, Mark Volkmann <r.mark....@gmail.com> wrote:
>
> (def my-string (print-dup [1 2 3]))
> (def my-data (read my-string))
>
> Can you give a simple example of serializing and deserializing a
> Clojure collection?

For "serializing" you have a couple options:
(def my-string (binding [*print-dup* true] (pr-str [1 2 3])))
(def my-string (with-out-str (print-dup [1 2 3] *out*)))

I don't know which is "better". Both produce a my-string like:
"#=(clojure.lang.LazilyPersistentVector/create [1 2 3])"


The regular 'read' function can read these *print-dup* strings to
reproduce the original structure, a.k.a. deserialize:

(def my-data (with-in-str my-string (read)))

--Chouser

Rich Hickey

unread,
Dec 3, 2008, 7:21:22 AM12/3/08
to clo...@googlegroups.com
On Tue, Dec 2, 2008 at 6:38 PM, Chouser <cho...@gmail.com> wrote:

On Tue, Dec 2, 2008 at 3:52 PM, Mark Volkmann <r.mark....@gmail.com> wrote:
>
> (def my-string (print-dup [1 2 3]))
> (def my-data (read my-string))
>
> Can you give a simple example of serializing and deserializing a
> Clojure collection?

For "serializing" you have a couple options:
(def my-string (binding [*print-dup* true] (pr-str [1 2 3])))
(def my-string (with-out-str (print-dup [1 2 3] *out*)))

I don't know which is "better". 

The former (binding *print-dup*) is better and the only way to ensure things work correctly in aggregates.

Rich


Reply all
Reply to author
Forward
0 new messages