Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
binary serialization
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  12 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
fft1976  
View profile  
 More options Aug 10 2009, 10:25 pm
From: fft1976 <fft1...@gmail.com>
Date: Mon, 10 Aug 2009 19:25:43 -0700 (PDT)
Local: Mon, Aug 10 2009 10:25 pm
Subject: binary serialization
Is there a way to do binary serialization of Clojure/Java values?
ASCII (read) and (write) are nice, but they are wasting space,
truncating floats and are probably slow compared to binary
serialization.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kyle R. Burton  
View profile  
 More options Aug 10 2009, 10:42 pm
From: "Kyle R. Burton" <kyle.bur...@gmail.com>
Date: Mon, 10 Aug 2009 22:42:49 -0400
Local: Mon, Aug 10 2009 10:42 pm
Subject: Re: binary serialization

> Is there a way to do binary serialization of Clojure/Java values?
> ASCII (read) and (write) are nice, but they are wasting space,
> truncating floats and are probably slow compared to binary
> serialization.

The following utility functions have worked in many cases for me:

(defn object->file [obj file]
  (with-open [outp (java.io.ObjectOutputStream.
(java.io.FileOutputStream. file))]
    (.writeObject outp obj)))

(defn file->object [file]
  (with-open [inp (java.io.ObjectInputStream. (java.io.FileInputStream. file))]
    (.readObject inp)))

(defn freeze
  ([obj]
     (with-open [baos (java.io.ByteArrayOutputStream. 1024)
                 oos  (java.io.ObjectOutputStream. baos)]
       (.writeObject oos obj)
       (.toByteArray baos)))
  ([obj & objs]
     (freeze (vec (cons obj objs)))))

One caveat though is that currently some of the clojure data types
(like Symbols) that I thought would have been serializable are not.  I
think that in the case of Clojure symbols it is being addressed
though.

Hope this helps,

Regards,

Kyle Burton


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kyle R. Burton  
View profile  
 More options Aug 10 2009, 10:57 pm
From: "Kyle R. Burton" <kyle.bur...@gmail.com>
Date: Mon, 10 Aug 2009 22:57:35 -0400
Local: Mon, Aug 10 2009 10:57 pm
Subject: Re: binary serialization

Sorry, forgot to offer up the inverse of freeze, thaw:

(defn thaw [bytes]
  (with-open [bais (java.io.ByteArrayInputStream. bytes)
              ois  (java.io.ObjectInputStream. bais)]
    (.readObject ois)))

Regards,

Kyle


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
fft1976  
View profile  
 More options Aug 10 2009, 11:10 pm
From: fft1976 <fft1...@gmail.com>
Date: Mon, 10 Aug 2009 20:10:24 -0700 (PDT)
Local: Mon, Aug 10 2009 11:10 pm
Subject: Re: binary serialization
On Aug 10, 7:57 pm, "Kyle R. Burton" <kyle.bur...@gmail.com> wrote:

Does all this work with cycles, Java arrays, etc.?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kyle R. Burton  
View profile  
 More options Aug 10 2009, 11:19 pm
From: "Kyle R. Burton" <kyle.bur...@gmail.com>
Date: Mon, 10 Aug 2009 23:19:22 -0400
Local: Mon, Aug 10 2009 11:19 pm
Subject: Re: binary serialization

> Does all this work with cycles, Java arrays, etc.?

It will work with anything that implements the Serializable interface
in Java.  Arrays do implement that interface, as do all the
primitives.  With respect to cycles, I'd suspect it does, but would
test it.  If you have a repl handy it should be pretty easy to test
those functions out on your data structures.

What class has the cycle?  Is it a standard collection?

Regards,

Kyle

--
--------------------------------------------------------------------------- ---
kyle.bur...@gmail.com                            http://asymmetrical-view.com/
--------------------------------------------------------------------------- ---


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
fft1976  
View profile  
 More options Aug 11 2009, 12:17 am
From: fft1976 <fft1...@gmail.com>
Date: Mon, 10 Aug 2009 21:17:46 -0700 (PDT)
Local: Tues, Aug 11 2009 12:17 am
Subject: Re: binary serialization
On Aug 10, 8:19 pm, "Kyle R. Burton" <kyle.bur...@gmail.com> wrote:

> > Does all this work with cycles, Java arrays, etc.?

> It will work with anything that implements the Serializable interface
> in Java.  Arrays do implement that interface, as do all the
> primitives.  With respect to cycles, I'd suspect it does, but would
> test it.  If you have a repl handy it should be pretty easy to test
> those functions out on your data structures.

> What class has the cycle?  Is it a standard collection?

Cycles are a special case of substructure sharing. Let's talk about
that instead.

(def common [1 2 3 4 5])
(def a [6 common])
(def b [7 common])
(def c [a b])

If you are serializing c, I want "common" to get copied only once.

I don't know JVM too well, but I think no efficient user-level
solution is possible. Why? To take care of substructure sharing, you
need to remember a set of shareable values that have already been
serialized, and do "reference equality" comparisons when new new
substructures are serialized.

This comparison and a set implementation can easily be done with
pointers (because you have "<"), but there are no pointers in the JVM,
and no "reference inequality", so you must use linear seeks, making
the time complexity of serialization quadratic, where in C/C++ it
could be O(N log N)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christian Vest Hansen  
View profile  
 More options Aug 11 2009, 4:07 am
From: Christian Vest Hansen <karmazi...@gmail.com>
Date: Tue, 11 Aug 2009 10:07:27 +0200
Local: Tues, Aug 11 2009 4:07 am
Subject: Re: binary serialization
Java object serialization handles cycles based on object identity.

--
Venlig hilsen / Kind regards,
Christian Vest Hansen.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Harrop  
View profile  
 More options Aug 11 2009, 11:36 am
From: John Harrop <jharrop...@gmail.com>
Date: Tue, 11 Aug 2009 11:36:45 -0400
Local: Tues, Aug 11 2009 11:36 am
Subject: Re: binary serialization

On Tue, Aug 11, 2009 at 12:17 AM, fft1976 <fft1...@gmail.com> wrote:
> I don't know JVM too well, but I think no efficient user-level
> solution is possible. Why? To take care of substructure sharing, you
> need to remember a set of shareable values that have already been
> serialized, and do "reference equality" comparisons when new new
> substructures are serialized.

> This comparison and a set implementation can easily be done with
> pointers (because you have "<"), but there are no pointers in the JVM,
> and no "reference inequality", so you must use linear seeks, making
> the time complexity of serialization quadratic, where in C/C++ it
> could be O(N log N)

Reference equality is available in the JVM (instructions if_acmpeq and
if_acmpne), in Java (operators == and !=), and in Clojure (predicate
identical?). Furthermore, though < on pointers isn't, so a tree-map of
already serialized structures to themselves also isn't, Java provides
System.identityHashCode() and IdentityHashMap. These use a hash that
respects reference equality. So one in fact can implement one's own
serialization that is O(n) using O(1) hashmap lookups (and using reflection,
and not working if SecurityManager won't let you setAccessible private
fields and the like, so not in an unsigned applet).

(Another use for reference equality is to see if Double.valueOf() is
caching, something that arose as an issue in another thread. On my system,
Sun JVM 1.6.0_13 -server and Clojure 1.0.0, it apparently is not:

user=> (identical? 2.0 2.0)
false

If this comes out to true then it's caching. Integer.valueOf() is caching on
my system, but only for small integers:

user=> (identical? 1 1)
true
user=> (identical? 5 5)
true
user=> (identical? 50 50)
true
user=> (identical? 500 500)
false
user=> (identical? 255 255)
false
user=> (identical? 127 127)
true
user=> (identical? 128 128)
false

The threshold seems to be at values that will fit in one byte.

[Remember the literals get boxed when passed to a function like identical?
that isn't inlined. And identical? isn't inlined:

user=> (meta (var identical?))
{:ns #<Namespace clojure.core>, :name identical?, :doc "Tests if 2 arguments
are the same object", :arglists ([x y])}

whereas two-argument + is:

user=> (meta (var +))
{:ns #<Namespace clojure.core>, :name +, :file "clojure/core.clj", :line
549, :arglists ([] [x] [x y] [x y & more]), :inline-arities #{2}, :inline
#<core$fn__3329 clojure.core$fn__3329@d337d3>, :doc "Returns the sum of
nums. (+) returns 0."}

You can also use ^#'identical? and ^#'+ but I like my Clojure looking like
Lisp, not like perl. :)])


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Harrop  
View profile  
 More options Aug 11 2009, 11:15 am
From: John Harrop <jharrop...@gmail.com>
Date: Tue, 11 Aug 2009 11:15:10 -0400
Local: Tues, Aug 11 2009 11:15 am
Subject: Re: binary serialization

On Mon, Aug 10, 2009 at 10:57 PM, Kyle R. Burton <kyle.bur...@gmail.com>wrote:

> On Mon, Aug 10, 2009 at 10:42 PM, Kyle R. Burton<kyle.bur...@gmail.com>
> wrote:Sorry, forgot to offer up the inverse of freeze, thaw:

> (defn thaw [bytes]
>  (with-open [bais (java.io.ByteArrayInputStream. bytes)
>              ois  (java.io.ObjectInputStream. bais)]
>    (.readObject ois)))

> Regards,

> Kyle

Which in turn gives us this, otherwise sorely lacking from the Java standard
library, but much less useful to us Clojurians who tend to mainly use
immutable objects:

(defn deep-copy [obj]
  (thaw (freeze obj)))

(Object.clone() does a shallow copy and typically isn't as widely available
as Serializable.)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
fft1976  
View profile  
 More options Aug 11 2009, 2:31 pm
From: fft1976 <fft1...@gmail.com>
Date: Tue, 11 Aug 2009 11:31:48 -0700 (PDT)
Local: Tues, Aug 11 2009 2:31 pm
Subject: Re: binary serialization

On Aug 11, 8:36 am, John Harrop <jharrop...@gmail.com> wrote:

> System.identityHashCode() and IdentityHashMap. These use a hash that
> respects reference equality. So one in fact can implement one's own
> serialization that is O(n) using O(1) hashmap lookups (and using reflection,
> and not working if SecurityManager won't let you setAccessible private
> fields and the like, so not in an unsigned applet).

Good to know, thanks. By the way, hash table operations are O(log N),
because calculating the hash needs to be O(log N), but I'm nitpicking
now.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
fft1976  
View profile  
 More options Aug 11 2009, 2:33 pm
From: fft1976 <fft1...@gmail.com>
Date: Tue, 11 Aug 2009 11:33:30 -0700 (PDT)
Local: Tues, Aug 11 2009 2:33 pm
Subject: Re: binary serialization
On Aug 11, 8:15 am, John Harrop <jharrop...@gmail.com> wrote:

> Which in turn gives us this, otherwise sorely lacking from the Java standard
> library, but much less useful to us Clojurians who tend to mainly use
> immutable objects:

> (defn deep-copy [obj]
>   (thaw (freeze obj)))

Somebody should benchmark that vs manual implementations of deep copy
in Java.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Harrop  
View profile  
 More options Aug 11 2009, 11:00 pm
From: John Harrop <jharrop...@gmail.com>
Date: Tue, 11 Aug 2009 23:00:56 -0400
Local: Tues, Aug 11 2009 11:00 pm
Subject: Re: binary serialization

On Tue, Aug 11, 2009 at 2:31 PM, fft1976 <fft1...@gmail.com> wrote:
> On Aug 11, 8:36 am, John Harrop <jharrop...@gmail.com> wrote:

> > System.identityHashCode() and IdentityHashMap. These use a hash that
> > respects reference equality. So one in fact can implement one's own
> > serialization that is O(n) using O(1) hashmap lookups (and using
> reflection,
> > and not working if SecurityManager won't let you setAccessible private
> > fields and the like, so not in an unsigned applet).

> Good to know, thanks. By the way, hash table operations are O(log N),
> because calculating the hash needs to be O(log N), but I'm nitpicking
> now.

The time taken to calculate an object's hash depends on that object's class.
For a String for instance it is linear in the String's length; for an
Integer, it is constant. Furthermore, hashes are often cached
(java.lang.String caches its hash for example). So if A and B are objects
that share a common reference to an object C in their fields, and use C's
hash code in computing their own, C's hash code may be computed only once,
and the cost of hashing A and B may be lower than the sum of the cost of
hashing only A and the cost of hashing only B. Furthermore, if A and B are
serialized again later, their own hashes may not need to be recalculated...

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »