Edn, Fressian, Transit: which of these formats have a future?

3,386 views
Skip to first unread message

Ryan Schmitt

unread,
Mar 15, 2015, 9:40:21 PM3/15/15
to clo...@googlegroups.com
I'm the author of dynamic-object, an open source library that makes Clojure's data modeling power available to Java programmers. This includes features like serialization and deserialization. I'll copy this small usage example from the README to give you a sense of how it works:

public interface Album extends DynamicObject<Album> {
  @Key(":artist") String getArtist();
  @Key(":album")  String getAlbum();
  @Key(":tracks") int getTracks();
  @Key(":year")   int getYear();
}

String edn = "{:artist \"Meshuggah\", :album \"Chaosphere\", :tracks 8, :year 1998}";
Album album = DynamicObject.deserialize(edn, Album.class);
album.getArtist(); // => "Meshuggah"

dynamic-object has always been opinionated about using Edn as the primary data language on the wire, for a number of reasons. For a long time, I also thought about adding Fressian support to dynamic-object, and I've recently done so on an experimental basis. (It looks like this.) Some time after I initially released dynamic-object, Transit was also released, with support for various encodings (JSON, JSON-Verbose, MessagePack).

In working (to different extents) with these data languages, I've had some apprehensions about all of them.
  • There is a lack of tooling available for Edn, such as validators and pretty-printers. I spent a while looking for an Edn equivalent of python -mjson.tool and never found one. Clojure's built-in pprint function does not work out-of-the-box to pretty print arbitrary values, and also appears to handle some data structures, such as records, incorrectly. (pprint omits reader tags when printing records.) pprint's underlying implementation, cl-format, is extremely powerful and could almost certainly be used to build a validating Edn pretty-printer, but it would have an unacceptably long startup time.
  • There is a lack of high-quality Edn implementations for different languages. Because the Edn spec is not very formal or complete, there seems to be some uncertainty regarding what constitutes an Edn implementation in the first place. For instance, clojure.edn parses the Ratio type as a builtin, even though it is mentioned nowhere in the spec. (Issue.) There are also oddities such as the recommended C++ implementation describing itself as "experimental."
  • Fressian's reference Java implementation is almost totally undocumented. This is a problem, because I'm writing a library that targets Java developers; they won't be going through the Clojure bindings (which are decently documented). Fressian's source code is outstanding, but it's still not documentation.
  • Due to the lack of documentation, it's not clear which parts of Fressian are actually stable. Stuart Halloway's data.fressian talk included some parentheticals about the extension points being subject to change, which so far they haven't, but that might only be because of the following point...
  • Fressian does not seem to have gotten any attention since the initial launch. People have submitted GitHub issues, including one surprisingly high-quality bug report, but they have all been ignored. The JIRA is mostly tumbleweeds.
  • The Clojure bindings for Fressian, namely data.fressian, are essentially incomplete. With the exception of maps, Clojure collection types do not round trip, and in at least one case (vectors) that is because of a blocking issue in the underlying Fressian implementation.
  • There are no documented best practices for the use of Fressian or some of its more advanced features like chunking. It is not clear how to read and write Fressian in a way that facilitates (for instance) ranged reads from the middle of a resource. It is not clear when checksums should be used and how they should be validated. It is not clear whether tags should be namespaced, or how. The only namespaced tag in data.fressian is for IRecord; none of the other type tags are namespaced. It's not clear whether this is due to bugwards compatibility.
  • Transit is advertised as a work-in-progress. This is the main reason I haven't seriously considered adding Transit support to dynamic-object.
  • However, what happens when Transit is stabilized (if that ever happens)? Since Transit offers a msgpack encoding, will Fressian then be irrelevant (except for legacy use cases)? There's a FUD aspect here--I like Fressian and I want dynamic-object to support it, but I don't want to back the wrong pony and end up having to support HD-DVD and Betamax for all time (so to speak).
  • Can these formats be unified? Can Edn and Fressian encodings for Transit be offered? Would that even accomplish anything?

I realize that none of these data languages will have the same extent of support and tooling as JSON or XML, but I want to ensure that dynamic-object's supported data languages all have attentive stewardship and bright futures. It's distressing that a lot of the issues with Edn and Fressian have not gotten much traction. Are these languages still actively being supported and fostered? If so, how much development activity is taking place on internal forks? Are any public updates planned for these languages any time soon?

Alex Miller

unread,
Mar 15, 2015, 10:51:55 PM3/15/15
to clo...@googlegroups.com
Hi Ryan,

To answer the big question, all of these data formats are in active use at Cognitect and while I make no promises, I expect them all to be alive and active for the knowable future. Each of them targets a different niche but all of them share the qualities of transmitting extensible typed values.

edn is the best choice for human-readable data. It is however, less efficient to transmit and depends on writing a high-performance parser - this is a high bar in some language environments. edn is most attractive right now to Clojure users b/c of its proximity to Clojure itself. While it has many advantages as an extensible readable literal data format, it's an uphill battle to sell that against other data formats that already have greater mindshare and tooling in other language communities.

fressian is the highest performance option - it takes full advantage of a number of compression tricks and has support for arbitrary user-extensible caching. Again, it requires a fair amount of effort to write a fressian library though so it's probably best for JVM-oriented endpoints right now. By seeking greatest performance, fressian also makes tradeoffs that narrow its range of use and interest group.

transit is a pragmatic midpoint between these two. It focuses, like fressian, on program-to-program data transmission however, the data can be made readable (like edn) via the json-verbose mode. Like fressian, transit contains caching capabilities but they are more limited and not user-extensible.  transit is designed primarily to have the most high-quality implementations per lowest effort - effectively shooting for greater reach than either edn or fressian by lower the bar to implementation. The bar is lowered by reusing high-performance parser for either JSON or messagepack which exist in a large number of languages. Of particular importance is leveraging the very high performance JSON parsers available in JavaScript runtimes, making transit viable as a browser-side endpoint for a fraction of the effort required to write a high performance edn or fressian endpoint. As transit explicitly seeks reach and portability, it is naturally the format with the broadest potential usage.

Hopefully that lays out the landscape a bit more. With respect to support, all of these formats are supported by Cognitect. Like everyone else, we're managing priorities across a large number of projects. Filing issues is great, voting on issues is great, reports like this are great. External tooling is welcome (like pretty printers, etc). Because we use these projects inside internal Cognitect projects and products, we do not currently have a community contribution model for most of these as there may be impacts that are not publicly visible. That could change in the future.

I am not an expert on everything you mention below but at a glance it looks like on-point feedback and I will raise it with the appropriate people and we can follow back here or on the relevant issues as needed.

Thanks,
Alex

Lucas Bradstreet

unread,
Mar 17, 2015, 12:30:45 AM3/17/15
to clo...@googlegroups.com
I have had a lot of success with nippy https://github.com/ptaoussanis/nippy, which is quite well documented. We had problems with fressian round tripping Clojure collections, as you've described.

It has a number of other features (compression, encryption) that I may end up using. It has given attention to backwards compatibility, and there is a mode to turn this backwards compatibility off if it is not required. 

Lucas 
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrey Antukh

unread,
Mar 17, 2015, 5:26:36 AM3/17/15
to clo...@googlegroups.com
+1 nippy
I have successfully used it in few of my projects.
--

Ryan Schmitt

unread,
Mar 20, 2015, 3:29:03 AM3/20/15
to clo...@googlegroups.com
Nippy looks like an interesting project; I wasn't familiar with it. However, it seems to be very Clojure-centric. Which is fine, at least for certain use cases; it just doesn't really occupy the part of the serialization format design space that I'm most interested in for dynamic-object.

Ryan Schmitt

unread,
Mar 20, 2015, 3:32:17 AM3/20/15
to clo...@googlegroups.com
Thanks for this breakdown; a lot of what you're saying about Transit is stuff I had inferred from prior announcements, but it's still enlightening to see an explicit comparison to Edn and Fressian.
Reply all
Reply to author
Forward
0 new messages