very large datasets with clojure.contrib.sql

57 views
Skip to first unread message

Justin Balthrop

unread,
Jul 7, 2010, 6:29:56 PM7/7/10
to Clojure Dev
Last night, I ran into an issue with clojure.contrib.sql:

If you have a very large dataset you will run out of memory because by
default, it pulls the entire result set into memory (even though it
creates a lazy-seq using resultset-seq).

This issue has been discussed previously here:
http://groups.google.com/group/clojure/browse_thread/thread/7b0c250e0ba6c9eb/fb9001522b49c20a

The fix is pretty simple, just call (.setFetchSize stmt 1). But,
clojure.contrib.sql/with-query-results doesn't give you a way to do
that currently. So I've created a patch that supports setting any of
the attributes on stmt in the call to with-query-results. Here's the
code:

http://github.com/ninjudd/clojure-contrib/commit/332fe019c864fcd2f052a8bd13340c0ec259e5c4

Now you can do this:

(with-connection {…}
(.setAutoCommit (sql/connection) false) ;; needed for postgres
(with-query-results results ["SELECT id, data FROM nodes"]
{:fetch-size 1000}
(doseq [r results]
…)))

This code used to run out of memory very quickly, but now it handles
huge datasets with 100 million records with no problems.

In the process, I added clojure.contrib.bean. It can be used like
this:

(update-bean instance {:foo-bar 6 :baz "five"})
;; (.setFooBar instance 6)
;; (.setBaz instance "five")

Setting bean attributes is something I find myself needing to do a lot
lately, so instead of duplicating the code throughout each project, it
seemed to make sense to have the functionality be in contrib. The code
originally came from Lancet, but was modified to be more efficient.

So, what does everyone think? I'd like to submit this to be included
in contrib. Should I open a ticket for it on Assembla?

Thanks,
Justin

Stephen C. Gilardi

unread,
Jul 9, 2010, 1:13:02 AM7/9/10
to cloju...@googlegroups.com

On Jul 7, 2010, at 6:29 PM, Justin Balthrop wrote:

> Last night, I ran into an issue with clojure.contrib.sql:
>
> If you have a very large dataset you will run out of memory because by
> default, it pulls the entire result set into memory (even though it
> creates a lazy-seq using resultset-seq).
>
> This issue has been discussed previously here:
> http://groups.google.com/group/clojure/browse_thread/thread/7b0c250e0ba6c9eb/fb9001522b49c20a
>
> The fix is pretty simple, just call (.setFetchSize stmt 1). But,
> clojure.contrib.sql/with-query-results doesn't give you a way to do
> that currently. So I've created a patch that supports setting any of
> the attributes on stmt in the call to with-query-results. Here's the
> code:
>
> http://github.com/ninjudd/clojure-contrib/commit/332fe019c864fcd2f052a8bd13340c0ec259e5c4
>
> Now you can do this:
>
> (with-connection {…}
> (.setAutoCommit (sql/connection) false) ;; needed for postgres
> (with-query-results results ["SELECT id, data FROM nodes"]
> {:fetch-size 1000}
> (doseq [r results]
> …)))

Please do open an issue. I'll take a look at the fix. Do you have a contributor agreement in place?

--Steve

Justin Balthrop

unread,
Jul 9, 2010, 1:31:19 PM7/9/10
to Clojure Dev
Yeah, I have a CA. Here's the ticket with the patch:
https://www.assembla.com/spaces/clojure-contrib/tickets/88-clojure-contrib-sql-runs-out-of-memory-on-very-large-datasets


On Jul 8, 10:13 pm, "Stephen C. Gilardi" <squee...@mac.com> wrote:
> On Jul 7, 2010, at 6:29 PM, Justin Balthrop wrote:
>
>
>
>
>
> > Last night, I ran into an issue with clojure.contrib.sql:
>
> > If you have a very large dataset you will run out of memory because by
> > default, it pulls the entire result set into memory (even though it
> > creates a lazy-seq using resultset-seq).
>
> > This issue has been discussed previously here:
> >http://groups.google.com/group/clojure/browse_thread/thread/7b0c250e0...
>
> > The fix is pretty simple, just call (.setFetchSize stmt 1). But,
> > clojure.contrib.sql/with-query-results doesn't give you a way to do
> > that currently. So I've created a patch that supports setting any of
> > the attributes on stmt in the call to with-query-results. Here's the
> > code:
>
> >  http://github.com/ninjudd/clojure-contrib/commit/332fe019c864fcd2f052...
Reply all
Reply to author
Forward
0 new messages