Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Pulling data from DB into memory
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  3 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Ivan Sagalaev  
View profile  
 More options Feb 18 2009, 3:20 am
From: Ivan Sagalaev <man...@softwaremaniacs.org>
Date: Wed, 18 Feb 2009 11:20:39 +0300
Local: Wed, Feb 18 2009 3:20 am
Subject: Pulling data from DB into memory
Hello!

This Malcolm's reply in a nearby thread:

> Unless you're
> using really huge querysets, the memory usage is not going to kill you.
> Pulling back the huge number of results already uses a bunch of memory
> and that's a property of the db wrapper.

... has reminded me that this behavior was bothering me for some time.
In fact pulling everything into a local memory on .execute('select') is
not an inherent property of a db wrapper. MySQLdb and psycopg2 can do
server-side cursors and I'm pretty sure Oracle's driver is as able.

There even was a discussion about this feature [1] where Malcolm has
raised some practical questions. I believe this feature is very useful
in some cases where one wants to iterate over a queryset and count
something or dump its content into another storage. So is there an
interest in implementing this behavior to be controllable from queryset
interface?

[1]:
http://groups.google.com/group/django-developers/browse_frm/thread/f7...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Malcolm Tredinnick  
View profile  
 More options Feb 18 2009, 5:36 pm
From: Malcolm Tredinnick <malc...@pointy-stick.com>
Date: Thu, 19 Feb 2009 09:36:36 +1100
Local: Wed, Feb 18 2009 5:36 pm
Subject: Re: Pulling data from DB into memory

So, once again, it was Jeremy stirring the pot and it's you keeping the
issue alive a couple of years later. Clearly I have to create a whole
new category of troublemaker in my little book. The current severity
levels don't go high enough. :-)

I guess you should be thanked for having a good memory, though. But
don't think that's getting you back in the good books.

> I believe this feature is very useful
> in some cases where one wants to iterate over a queryset and count
> something or dump its content into another storage. So is there an
> interest in implementing this behavior to be controllable from queryset
> interface?

> [1]:
> http://groups.google.com/group/django-developers/browse_frm/thread/f7...

I think the comments and requests for information Younger Me made in
that thread are still valid. These days I would also throw in checking
how concurrent cursor behaviour on the same connection works. We already
know that SQLite cannot handle creating a new result set / cursor when
an existing cursor is only partially read and I wonder if there are
similar issues like that (again, it's related to lifecycle and resource
management of the server-side data).

As I wrote that previous paragraph, I also remembered [2], mentioning
unbuffered cursors in MySQLdb (at least in the future, as of a year ago)
and the need for one cursor at a time in that case. I haven't
investigated that any deeper than re-reading the post just now, but it's
something I would want to understand before making decisions about what
an API looks like here.

I suspect it is probably possible to make this work. How much it
complicates Django internals and userspace usage is something to
understand first, though.

[2] http://mysql-python.blogspot.com/2008/03/i-am-not-dead.html

Regards,
Malcolm


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ivan Sagalaev  
View profile  
 More options Feb 18 2009, 6:47 pm
From: Ivan Sagalaev <man...@softwaremaniacs.org>
Date: Thu, 19 Feb 2009 02:47:28 +0300
Local: Wed, Feb 18 2009 6:47 pm
Subject: Re: Pulling data from DB into memory

Malcolm Tredinnick wrote:
> So, once again, it was Jeremy stirring the pot and it's you keeping the
> issue alive a couple of years later.

Well, the issue wasn't resolved. What could I do? :-)

> I think the comments and requests for information Younger Me made in
> that thread are still valid. These days I would also throw in checking
> how concurrent cursor behaviour on the same connection works. We already
> know that SQLite cannot handle creating a new result set / cursor when
> an existing cursor is only partially read and I wonder if there are
> similar issues like that (again, it's related to lifecycle and resource
> management of the server-side data).

I think we can solve the whole issue of different concurrency levels
within one connection among backends just by having more than one
connection. I.e. make a new connection for each queryset requested with
a server-side flag.

API may be as simple as:

     MyModel.objects.all().server_side()

which sould trigger a flag in a queryset instance that is then passed
onto backends cursor() method. This will require changing all backends
to hold a list of opened connections but I thinks it's not too
complicated. Luckily there's no public API in Django for accessing
connection objects directly. There's db._connection but it has '_'
before it for a reason :-)

Another option is to reuse an old proposal of Simon Willison on .using()
method for querysets that switches it between several configured
databases. Then we can invent a new setting system for multiple DBs like:

     DATABASE = {
         'default': ...,
         'huge_archive': {
             'name': ...,
             'user_server_side_cursor': True,
         }
     }

And then a queryset can `.using('huge_archive')` to use server-side
cursors. But I like this thing less because it's more complex and it
confuses into one method slightly different things.

> As I wrote that previous paragraph, I also remembered [2], mentioning
> unbuffered cursors in MySQLdb (at least in the future, as of a year ago)
> and the need for one cursor at a time in that case.

I believe Andy was talking about MySQLdb > 1.2 but we're still on 1.2 so
it's not something I'd consider here.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »