I need to run a fairly CPU-intensive calculation nightly over a
dataset that's already large and growing quickly. I'm planning to run
this via a cron job, but would like to make sure that it neither eats
up the entire CPU nor locks the database, so that my site can continue
functioning in the meantime. The rough outline of what it needs to do
is as follows:
class OtherThing(models.Model):
anotherthing = models.ManyToManyField(Whatever)
...
class Thing(models.Model):
other_things = models.ManyToManyField(OtherThing,
through='SomethingElse')
...
for thing in Thing.objects.select_related('other_things',
'other_things__anotherthing__etc'):
calculated = calculation_on_thing_and_its_otherthings(thing) #
this mainly involves serialization to a great depth
thing.calculated_data = calculated
thing.save()
Will the above approach lock the database for a while or eat tons of
CPU? Any suggestions? I'm using Django 1.2, btw.
Thanks,
-Nan
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django...@googlegroups.com.
To unsubscribe from this group, send email to django-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Andre: yes, that's why I'm doing the calculation overnight and caching
it in the DB. But the data it's based on changes almost daily and we
want the calculated version we're working with to be no more than 48
hours old -- hence the nightly recalculation. I'd love tips on how to
get that nightly recalculation not to bring the entire site grinding
to a halt. ;-)
On Nov 22, 7:51 pm, Andre Terra <andrete...@gmail.com> wrote:
> You will definitely need to look into caching those results, perhaps
> "permanently" in a database.
>
> My recommendation is redis[1] and possibly tools like sebleier's
> django-redis-cache[2]. Cache invalidation is a pain, I know, but it's
> pretty much the only way to go.
>
> Long term, you will need to profile the bottlenecks and dive into the
> django generated SQL to find if you can tune it by refactoring, and
> possibly switching to either .raw() or .sql() in some cases.
>
> There are plenty of presentations from python/django conferences out there
> that touch on the subject of ORM optimization, so don't be afraid to google.
>
> Good luck!
>
> Cheers,
> AT
>
> [1]http://redis.io
> [2]https://github.com/sebleier/django-redis-cache
>
> On Tue, Nov 22, 2011 at 9:04 PM, Nikolas Stevenson-Molnar <
>
>
>
>
>
>
>
> nik.mol...@consbio.org> wrote:
> > I wouldn't expect it to lock the database (though someone with more
> > database expertise should address that). I *would* expect it to consume
There's also an "ionice" that should make disk (and possibly
network) I/O operate in a manner kinder to your machine. You'd
launch your cron job with the nice/ionice settings.
-tkc
Can anyone comment on the database locking question?
From your pseudocode, it looks like you're performing a possibly-lengthy
select, followed by lengthy calculations (not involving database
operations), and then a (presumably) quick save. AFAIK, databases never
lock when doing selects, so depending on the nature of your
calculations, you should be fine.
Additionally, if in doubt, I recommend running some tests with the same
database used in your production environment.
_Nik
MySQL will always lock tables for writes when reading from them.
Therefore, any long running query on a mysql table will result in
updates to that table being locked out.
The easiest way around this is with hardware. Use a master-slave DB
setup, and perform your long reads on the slave(s), and all your
writes on the master.
Cheers
Tom
Querysets are fetched from the database in chunks, right? I imagine
that the select itself is actually quite quick, but do the tables
remain locked between chunks? I.e. if the query returns a total of N
records and Django fetches N/10 records and then performs a bunch of
calculations and saves before returning to the DB for another N/10
records, will the DB be locked during all those calculations/saves?
On Nov 24, 4:47 am, Tom Evans <tevans...@googlemail.com> wrote:
> On Wed, Nov 23, 2011 at 8:09 PM, Nikolas Stevenson-Molnar
>
Well, it depends upon the database, but no, it doesn't work like that.
With MySQL, the query will cause the server to lock the table until
the client has finished reading all of the result. There are then two
modes to read the result from the server, row-by-row or all-at-once.
Django currently always fetches the entire result all at once,
regardless of how you then fetch the data from the queryset.
Cheers
Tom
but this result isn't the whole queryset result, it's a chunk of it.
the ORM adds 'LIMIT' arguments to the query. I think the answer to
Nan's question is that there's no lock across chunked reads.
--
Javier
On Nov 28, 10:17 am, Javier Guerra Giraldez <jav...@guerrag.com>
wrote:
> On Mon, Nov 28, 2011 at 10:10 AM, Tom Evans <tevans...@googlemail.com> wrote:
> > Django currently always fetches the entire result all at once,
> > regardless of how you then fetch the data from the queryset.
Ah, I must have been misunderstanding a discussion[1] on Django-
Developers.
> but this result isn't the whole queryset result, it's a chunk of it.
> the ORM adds 'LIMIT' arguments to the query. I think the answer to
> Nan's question is that there's no lock across chunked reads.
>
> --
> Javier
Thanks, Javier -- that's a huge help!
[1] http://groups.google.com/group/django-developers/browse_thread/thread/f19040e2e3229d7a#
(NB: you snipped the bit where I say that I am specifically talking
about MySQL - I still am)
No-one mentioned slicing or adding LIMITs into the query - but it is
irrelevant. When you issue a query, the DB tables that are read from
are locked until that data is returned to the client. That happens as
soon as mysql_store_result() in the MySQL C API finishes, at which
point all the data has been transferred from the server to the client.
This happens at the moment that you evaluate your query in django.
When you iterate through or otherwise access that query result in
django, it is no longer talking to the DB server, it does not fetch
any additional data, in chunks or otherwise. The queryset object cache
is built in chunks, but this relates to when Django creates model
instances, not communication with the database.
Cheers
Tom