clojure.data.json is monstrously slow?

348 views
Skip to first unread message

Andy Xue

unread,
May 23, 2012, 4:42:26 PM5/23/12
to cascalog-user
I just ran a side by side test, clojure.data.json 0.1.1 vs. clj-json
0.5.0 (clojure wrapper on the codehaus.jackon json parser) package

data: ~20gb, each line of data is a JSON map.
cluster: 20 c1.medium (amazon AWS)

clj-json finished in 7 minutes!
clojure.data.json finished in 4 hrs!

here is the code fragments from the tests:

(ns ...
(:require
[clojure.data.json :as json]
[clj-json.core :as jacksonjson]))
..)

(defn json-conversion-count [path]
(let [src (get-hfs-multitap path)]
(<- [?count]
(src !line)
(json/read-json !line :> !data-map)
(c/count :> ?count)
(:distinct false))
))

(defn jackson-json-conversion-count [path]
(let [src (get-hfs-multitap path)]
(<- [?count]
(src !line)
(jacksonjson/parse-string !line :> !data-map)
(c/count :> ?count)
(:distinct false))
))

i am surprised by the difference. I feel like only cascalog users
would use clojure.data.json so i thought I would put this up here as a
PSA, please let me know if anyone else has similar or dissimilar
experience with it.

Sam Ritchie

unread,
May 23, 2012, 4:47:58 PM5/23/12
to cascal...@googlegroups.com
I think Cheshire's supposed to be really good:

--
Sam Ritchie, Twitter Inc
@sritchie09

(Too brief? Here's why! http://emailcharter.org)

Marshall T. Vandegrift

unread,
May 24, 2012, 6:03:12 PM5/24/12
to cascal...@googlegroups.com
Andy Xue <and...@lumoslabs.com> writes:

> I just ran a side by side test, clojure.data.json 0.1.1 vs. clj-json
> 0.5.0 (clojure wrapper on the codehaus.jackon json parser) package

Is there a particular reason you're using clojure.data.json 0.1.1?
Version 0.1.2 introduced 5-10x speed up on basic benchmarks:

https://groups.google.com/forum/#!msg/clojure/gOrDeQ9bxl4/2ZuhCv6CNGcJ

And 0.1.3 -- which isn't mentioned on the github page, but does exist --
appears to have fixed a few bugs.

--
Marshall T. Vandegrift <lla...@damballa.com>
Damballa Staff Software Engineer | 518.859.4559m

Andy Xue

unread,
May 29, 2012, 5:13:28 PM5/29/12
to cascalog-user
0.1.1 was the most current when i put it into my project (months ago)
-- i guess i just never noticed how terrible it was until recently
though -- also my tests were showing that it was running ~40x slower
than clj-json, which makes me think that somehow its not even in the
same ballpark of performance, at least for the data i am working
with.

On May 24, 6:03 pm, "Marshall T. Vandegrift" <llas...@damballa.com>
wrote:
> Andy Xue <and...@lumoslabs.com> writes:
> > I just ran a side by side test, clojure.data.json 0.1.1 vs. clj-json
> > 0.5.0 (clojure wrapper on the codehaus.jackon json parser) package
>
> Is there a particular reason you're using clojure.data.json 0.1.1?
> Version 0.1.2 introduced 5-10x speed up on basic benchmarks:
>
>  https://groups.google.com/forum/#!msg/clojure/gOrDeQ9bxl4/2ZuhCv6CNGcJ
>
> And 0.1.3 -- which isn't mentioned on the github page, but does exist --
> appears to have fixed a few bugs.
>
> --
> Marshall T. Vandegrift <llas...@damballa.com>

Jason Toy

unread,
Jun 8, 2012, 8:25:00 PM6/8/12
to cascalog-user
yes, use cheshire for all json.
Reply all
Reply to author
Forward
0 new messages