Amazonica performance: options?

146 views
Skip to first unread message

Dave Tenny

unread,
Mar 28, 2014, 10:59:48 AM3/28/14
to clo...@googlegroups.com
I'm trying to code some amazonica based solutions in a nontrivial AWS environment.
I work with many AWS accounts and it isn't unusual to see a thousand instances running on one account, and similar excesses in other types of AWS resources.  So if you're going an ec2-describe-instances (or amazonica equivalent), it needs not to choke in this environment.

I like the way amazonica does all the bean marshalling for me so I can express queries simply.  But the returned datasets need to be more pragmatic/performant.

The problem for me is that Amazonica doesn't seem up to the task of dealing with queries that return large volumes of data.
It has nothing to do with reflection I suspect, and more to do with unwieldy amounts of duplicate information in the result unmarshalling process.
The "clojure all the way down" philosophy results of duplicated information and just printing the result to a file takes a long time.
If I accidentally let the output go to an emacs cider repl buffer, then things get so wedged up to the point I  may as well kill -9 emacs.
(Known cider repl issues here, it isn't all amazonica).

For example:  here's how long it takes to run the java based ec2 cli to describe instances on an account:

$ time ec2-describe-images >/tmp/ec2-cli-images.out

real    0m11.484s
user    0m2.564s
sys     0m0.129s


And here's how long it takes from a 'lein repl' to run the same query on the same account:

(time (with-output ["/tmp/clj-awz-images.out"] (println (ec2/describe-images))))
"Elapsed time: 194685.552683 msecs"

Now the amount of data being printed by the EC2 CLI is of course much different than the output from Amazonica,
amazonica is returning everything in gory duplicate map detail, ec2 is not, as evidenced by the relative output sizes:

-rw-rw-r--.  1 dave dave 17201290 Mar 28 10:35 clj-awz-images.out
-rw-rw-r--.  1 dave dave    99342 Mar 28 10:26 ec2-cli-images.out.11.5s

Where the amazonica output starts with:
{:images [{:hypervisor xen, :state available, :virtualization-type paravirtual, :root-device-type instance-store,
... and goes on like that with duplicate keywords all the way down.

Anyway, my goal isn't to turn amazonica into ec2 cli.  But even the most trivial operations in amazonica (especially the most trivial, i.e. those lacking filters against large data sets), pretty  much whack me left and right
with CPU wedged tools and (completely unacceptable) long waits for results.

Any suggestions on how to use amazonica in a way where the output is ... different, and minimal/workable?

Or am I left with going to another package or writing my own java sdk api's directly?

I'm pretty sure the results need to be structures whose relationship to data values is implicit (and not explicit in map keys). I don't see any options with amazonica to change this however.

Thanks for suggestions, forgive me if I've missed something obvious.  I'm just trying to see what's out there and at the same time move along quickly enough that I can get some usable tools for work (so I can lose all my python and bash scripts for various interfaces, I want clojure!).

- Dave


Michael Cohen

unread,
Mar 28, 2014, 12:52:57 PM3/28/14
to clo...@googlegroups.com
time ec2-describe-images -a > ec2-cli-images.txt

real  1m26.401s
user  0m6.551s
sys 0m1.159s

and writes a 7.5MB file to disk. Note the -a flag, to list all of the available public images.

in a repl,

(time (spit "clj-awz-images.txt" (describe-images)))

"Elapsed time: 90258.47 msecs"

and writes an 18MB file to disk containing all the available public images. 

Am I missing something? 

You can also pass a list of filters to the call to narrow the result.

Dave Tenny

unread,
Mar 28, 2014, 3:06:40 PM3/28/14
to clo...@googlegroups.com
Actually, let me withdraw the question for now.  If I call an unfiltered (describe-images) on my account I'll get ~27,900 images.  It takes 70 seconds to retrieve them using the Java api
(from clojure).

If I then print (str image) for all those images to a file, that makes adds another 153 seconds for a total of 223 seconds.  Presumably that's the normal java toString() method invocation.

If I print out the Amazonica version of it, it takes 195 seconds, presumably because we're sharing keyword references internally and so abusing memory less overall (just a wild guess).

So if I do the native calls and cherry pick the information I want (like the java EC2 CLI does), then I can get the time down significantly.
Otherwise Amazonica is probably doing a reasonable job given what I'm asking of it.

And, in the wisdom gained department, never do unfiltered (describe-images) requests if you can help it :-)




--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/QCBM_el5_78/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages