[ANN] Longshi a ClojureScript port of Fressian

265 views
Skip to first unread message

pe...@bendyworks.com

unread,
Aug 1, 2014, 4:33:58 PM8/1/14
to clojur...@googlegroups.com
I'm happy to announce a port of Fressian to ClojureScript.

The public api mirrors data.fressian api with a few exceptions. Records don't have a generic writer but you can easily append the handlers with the write-record function. There is no inheritance lookup for types so every different type will need it's own handler. The tagged helper functions (tagged-object?, tag, tagged-value) are not included.

I see the use case for Fressian in ClojureScript streaming large amounts of data that has significant structural similarity. Fressian caching capabilities allow large values to be represented as a single integer in the bytestream.

This is my first large ClojureScript library so any suggestions for improvements are welcome.

Thanks to Bendyworks for letting me develop this during my work hours.

Clojars Link: https://clojars.org/longshi
Repo: https://github.com/spinningtopsofdoom/longshi

Peter Schuck

Alex Miller

unread,
Aug 2, 2014, 11:23:27 AM8/2/14
to clojur...@googlegroups.com
Cool stuff Peter. It would be interesting to compare performance with transit-cljs https://github.com/cognitect/transit-cljs. Transit has the same caching and extensibility benefits of Fressian but leverages the very fast JavaScript parser capabilities built into the browser, so is likely faster.

Alex

Sean Grove

unread,
Aug 2, 2014, 2:26:46 PM8/2/14
to clojur...@googlegroups.com
I thought transit's caching only applied to map keys? Pretty unclear on what Fressian's can do


--
Note that posts from new members are moderated - please be patient with your first post.
---
You received this message because you are subscribed to the Google Groups "ClojureScript" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojurescrip...@googlegroups.com.
To post to this group, send email to clojur...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojurescript.

David Nolen

unread,
Aug 2, 2014, 2:49:17 PM8/2/14
to clojur...@googlegroups.com
In Transit, maps keys, symbols, keywords, and tagged value tags are
subject to caching.

Fressian's caching strategy is far more flexible from what I
understand. That said transit-cljs is 20-30X faster than
cljs.reader/read-string on the benchmarks I've tried across various
browser and command line JS environments.

David

Alex Miller

unread,
Aug 2, 2014, 5:24:19 PM8/2/14
to clojur...@googlegroups.com
I took "significant structural similarity" to primarily mean at least maps with similar keys, which Transit caching will cover.


You received this message because you are subscribed to a topic in the Google Groups "ClojureScript" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojurescript/xhdrGunEXPE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojurescrip...@googlegroups.com.

Peter Schuck

unread,
Aug 3, 2014, 9:51:45 AM8/3/14
to clojur...@googlegroups.com
Fressian's caching strategy is very flexible and powerful but doesn't have a lot of documetnation.  I view Fressian's caching as similar to memory references for Clojure values.  You can cache any value no matter how large and once the value is cached referencing it again is only a few bytes.  Fressian caches symbols, keywords, and tagged value tags by default. 

What I mean by "significant structural similarity" is a data structure or steam of data structures that have frequent repetitions of compound values.  An AST would probably be a canonical example of this.

Fressian is about five to ten times as slow as Transit and not really suited for the vast majority of applications where relatively small data structures need to be written and read as quickly as possible.  It took significant effort to get Fressian's speed to where it is now and I don't see a way forward to get it to match Transit's speed.  The main bottlenecks are reading and writing numbers and strings, hashing value, and determining an value's type.  Until those bottlenecks get resolved in future versions of JavaScript piggybacking of of JSON, like Transit, is the way to go for speed.

I hope that answers most questions, feel free to ask for more clarification.   I learned a lot about how to write fast ClojureScript / JavaScript and plan on writing a blog post about it in the near future.

Peter Schuck

David Nolen

unread,
Aug 3, 2014, 10:14:19 AM8/3/14
to clojur...@googlegroups.com
RE: writing/reading strings I wonder if this won't help
https://plus.google.com/+AddyOsmani/posts/4GgX9CD6c1X?

David

Peter Schuck

unread,
Aug 4, 2014, 10:26:57 AM8/4/14
to clojur...@googlegroups.com
That would definitely help, a large percentage of the conversion time is manually converting strings to utf-8 bytes and back again.  Thanks for pointing that out.

Peter Schuck

David Nolen

unread,
Aug 4, 2014, 10:30:29 AM8/4/14
to clojur...@googlegroups.com
Peter,

Cool let us know if you pursue this and have any numbers to share. It
may be the tipping point for going for binary formats like
Fressian/MessagePack over JSON representations.

David

Francis Avila

unread,
Aug 4, 2014, 10:51:15 AM8/4/14
to clojur...@googlegroups.com
I've looked closely at utf-8 encoders/decoders in JS vs TextEncoder, and I'm not sure fast string encoding/decoding alone is going to bring us the kind of speed improvements we need to make binary formats a clear win over JSON. (Some of this might be because the ArrayBuffer implementations are slower than they could be.)

Relevant JSPerfs (bops is the library used by messagepack-js):

http://jsperf.com/utf8-encoding-methods/2
http://jsperf.com/utf8-decoding-methods/2

Source code for the encoding/decoding methods in the jsperfs:

https://github.com/favila/utfate


That said this is still exciting.

David Nolen

unread,
Aug 4, 2014, 10:58:45 AM8/4/14
to clojur...@googlegroups.com
On Mon, Aug 4, 2014 at 10:51 AM, Francis Avila <fav...@breezeehr.com> wrote:
> I've looked closely at utf-8 encoders/decoders in JS vs TextEncoder, and I'm not sure fast string encoding/decoding alone is going to bring us the kind of speed improvements we need to make binary formats a clear win over JSON. (Some of this might be because the ArrayBuffer implementations are slower than they could be.)
>
> Relevant JSPerfs (bops is the library used by messagepack-js):
>
> http://jsperf.com/utf8-encoding-methods/2

These numbers look very good, what are your reservations w/ respect to
performance?

David

Francis Avila

unread,
Aug 4, 2014, 11:14:58 AM8/4/14
to clojur...@googlegroups.com
The difference in performance between native and non-native string
encoding (using the fastest js implementations I can manage to write,
granted) is at most 2x to 3x, which is certainly an improvement but
not enough to overcome the approximately order-of-magnitude difference
in overall encoding and decoding speed that we're looking at now.

So the benefit is that using a binary encoding will make sense in more
circumstances, but will still be slow enough that something
JSON-backed is a better choice in most cases.

David Nolen

unread,
Aug 4, 2014, 11:20:25 AM8/4/14
to clojur...@googlegroups.com
On Mon, Aug 4, 2014 at 11:14 AM, Francis Avila <fav...@breezeehr.com> wrote:
> The difference in performance between native and non-native string
> encoding (using the fastest js implementations I can manage to write,
> granted) is at most 2x to 3x, which is certainly an improvement but
> not enough to overcome the approximately order-of-magnitude difference
> in overall encoding and decoding speed that we're looking at now.

Order of magnitude for what? Fressian? Or are you talking about
existing JavaScript MessagePack implementations. It's my impression
these are only 4-5X slower than JSON and that a large contributor to
the slowdown is string decoding.

David

Francis Avila

unread,
Aug 4, 2014, 11:56:47 AM8/4/14
to clojur...@googlegroups.com
I thought MessagePack js implementations were quite a bit slower than
4-5x. Do you know of any benchmarks? String encoding/decoding is
definitely big part of the slowdown, not disagreeing.

MessagePack doesn't even have a string type: I think it's only
convention that their raw type contains utf-8 bytes if semantically a
string, and this is a big headache for interop. Couldn't transit
conceivably store strings as raw utf16 bytes or even arrays of packed
fixnum/uint16, tag the type at the transit level, and completely
sidestep the string encoding issue?

David Nolen

unread,
Aug 4, 2014, 12:41:02 PM8/4/14
to clojur...@googlegroups.com
I looked at various things on jsperf like
http://jsperf.com/json-bson-msgpack/2 and ran the benchmarks for
https://github.com/pgriess/node-msgpack on my machine.

David
> You received this message because you are subscribed to the Google Groups "ClojureScript" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clojurescrip...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages