This Malcolm's reply in a nearby thread:
> Unless you're
> using really huge querysets, the memory usage is not going to kill you.
> Pulling back the huge number of results already uses a bunch of memory
> and that's a property of the db wrapper.
... has reminded me that this behavior was bothering me for some time.
In fact pulling everything into a local memory on .execute('select') is
not an inherent property of a db wrapper. MySQLdb and psycopg2 can do
server-side cursors and I'm pretty sure Oracle's driver is as able.
There even was a discussion about this feature [1] where Malcolm has
raised some practical questions. I believe this feature is very useful
in some cases where one wants to iterate over a queryset and count
something or dump its content into another storage. So is there an
interest in implementing this behavior to be controllable from queryset
interface?
[1]:
http://groups.google.com/group/django-developers/browse_frm/thread/f7b78b399c478cf
So, once again, it was Jeremy stirring the pot and it's you keeping the
issue alive a couple of years later. Clearly I have to create a whole
new category of troublemaker in my little book. The current severity
levels don't go high enough. :-)
I guess you should be thanked for having a good memory, though. But
don't think that's getting you back in the good books.
> I believe this feature is very useful
> in some cases where one wants to iterate over a queryset and count
> something or dump its content into another storage. So is there an
> interest in implementing this behavior to be controllable from queryset
> interface?
>
> [1]:
> http://groups.google.com/group/django-developers/browse_frm/thread/f7b78b399c478cf
I think the comments and requests for information Younger Me made in
that thread are still valid. These days I would also throw in checking
how concurrent cursor behaviour on the same connection works. We already
know that SQLite cannot handle creating a new result set / cursor when
an existing cursor is only partially read and I wonder if there are
similar issues like that (again, it's related to lifecycle and resource
management of the server-side data).
As I wrote that previous paragraph, I also remembered [2], mentioning
unbuffered cursors in MySQLdb (at least in the future, as of a year ago)
and the need for one cursor at a time in that case. I haven't
investigated that any deeper than re-reading the post just now, but it's
something I would want to understand before making decisions about what
an API looks like here.
I suspect it is probably possible to make this work. How much it
complicates Django internals and userspace usage is something to
understand first, though.
[2] http://mysql-python.blogspot.com/2008/03/i-am-not-dead.html
Regards,
Malcolm
Well, the issue wasn't resolved. What could I do? :-)
> I think the comments and requests for information Younger Me made in
> that thread are still valid. These days I would also throw in checking
> how concurrent cursor behaviour on the same connection works. We already
> know that SQLite cannot handle creating a new result set / cursor when
> an existing cursor is only partially read and I wonder if there are
> similar issues like that (again, it's related to lifecycle and resource
> management of the server-side data).
I think we can solve the whole issue of different concurrency levels
within one connection among backends just by having more than one
connection. I.e. make a new connection for each queryset requested with
a server-side flag.
API may be as simple as:
MyModel.objects.all().server_side()
which sould trigger a flag in a queryset instance that is then passed
onto backends cursor() method. This will require changing all backends
to hold a list of opened connections but I thinks it's not too
complicated. Luckily there's no public API in Django for accessing
connection objects directly. There's db._connection but it has '_'
before it for a reason :-)
Another option is to reuse an old proposal of Simon Willison on .using()
method for querysets that switches it between several configured
databases. Then we can invent a new setting system for multiple DBs like:
DATABASE = {
'default': ...,
'huge_archive': {
'name': ...,
'user_server_side_cursor': True,
}
}
And then a queryset can `.using('huge_archive')` to use server-side
cursors. But I like this thing less because it's more complex and it
confuses into one method slightly different things.
> As I wrote that previous paragraph, I also remembered [2], mentioning
> unbuffered cursors in MySQLdb (at least in the future, as of a year ago)
> and the need for one cursor at a time in that case.
I believe Andy was talking about MySQLdb > 1.2 but we're still on 1.2 so
it's not something I'd consider here.