need django.cache advice

26 views
Skip to first unread message

Sells, Fred

unread,
Jan 9, 2014, 8:28:07 PM1/9/14
to django...@googlegroups.com

I’ve got a queryset that is ~ a 6 table join of a lot of data that generates about 1000 records with about 25 fields each

 

I’ve got ~100 terminals hitting this query every 1 minute looking for a subset of 25 records

 

~20 records change every minute

 

I was thinking about updating the data (queryset) once a minute on a cron job and saving that in the django.cache

 

Then each terminal would access the cache and and get the subset that applies to them.

 

Does that make sense?  Is there a better way?

 

Fred.

Erik Cederstrand

unread,
Jan 9, 2014, 9:29:07 PM1/9/14
to Django Users
Den 09/01/2014 kl. 21.28 skrev Sells, Fred <fred....@adventistcare.org>:

> I was thinking about updating the data (queryset) once a minute on a cron job and saving that in the django.cache
>
> Then each terminal would access the cache and and get the subset that applies to them.
>
> Does that make sense? Is there a better way?

1000 records from 6 tables shouldn’t be a problem if your database is properly indexed. Databases are good at caching query results and not doing excess work when the data hasn’t changed. Maybe the first request from a terminal will be slow(ish), but the rest should be fast.

What sort of performance problems are you having? How fast is it now, and how fast do you need it to be? Have you profiled the requests from the terminal and determined that the culprit is indeed the database?

You could go with your own suggestion, you could create a database view, you could cache the result in Varnish if the terminals are doing HTTP requests. But it depends on where in the stack you are wasting time.

Erik

Sells, Fred

unread,
Jan 9, 2014, 9:43:31 PM1/9/14
to django...@googlegroups.com
Thanks Erick, I was not aware of Varnish.

Erik wrote:

1000 records from 6 tables shouldn't be a problem if your database is properly indexed. Databases are good at caching query results and not doing excess work when the data hasn't changed. Maybe the first request from a terminal will be slow(ish), but the rest should be fast.

Fred: I was unclear. The initial join is 10000 x 1000 x 20 x 100 ... which gets reduced to about 1000 records. A few records are changed every second.

What sort of performance problems are you having? How fast is it now, and how fast do you need it to be? Have you profiled the requests from the terminal and determined that the culprit is indeed the database?

Fred: I suspect (but have not confirmed) that the problem in connection pooling (i.e. lack thereof). Some requests timeout and blank out the web page on the terminal.
The webpage itself is very simple. Part of my "todo" is to put a timing decorator on this view method to confirm the problem. I've increased virtual cpu's and ram on server
Which helped a little but deleting obsolete information helped a lot.

You could go with your own suggestion, you could create a database view, you could cache the result in Varnish if the terminals are doing HTTP requests. But it depends on where in the stack you are wasting time.

Fred:I cannot really cache the page because it should change every few minutes. I can accept a minute or two of delay in getting the latest data but not 5 minutes.
I would probably (ugly) modify the cache and the model so users are not complaining that their changes don't show up.

Erik

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/A179517C-499F-4C28-8B77-101A554B1885%40cederstrand.dk.
For more options, visit https://groups.google.com/groups/opt_out.

Javier Guerra Giraldez

unread,
Jan 9, 2014, 9:57:59 PM1/9/14
to django...@googlegroups.com
On Thu, Jan 9, 2014 at 4:43 PM, Sells, Fred
<fred....@adventistcare.org> wrote:
>
> Fred: I was unclear. The initial join is 10000 x 1000 x 20 x 100 ... which gets reduced to about 1000 records. A few records are changed every second.


are the 'interesting' 1000 out-records always the same, or changing
slowly? is it easy to test if any specifc in-record alters the result
or not?

if yes on both, then you might invert the calculation, from a 'pull'
method (the current) to a 'push', where you keep the result
precalculated and on each modification you test if it's worthwhile to
redo it.

this assumes:

A.- the process is expensive to do every time for every client (that's
the problem)

B.- it's cheap to verify if the new data alters the result (my initial
questions)

C.- the queries are frequent enough that few precalcs would be unseen.
(i guess so given the number of clients and rate of queries)


if condition C doesn't hold, but B does, then another thing to try:

- on each query, first check if the answer is already precalculated.
if it is, just get that

- if it's not, do the job and store in the cache

- on each input modification, check if it alters the result. if so,
just delete the cache.


finally, if condition B doesn't hold, just store the result in a cache
entry that expires in a minute. that means that the process would
occur at most once a minute and the clients could get results that are
at most a minute old.


--
Javier

Sells, Fred

unread,
Jan 9, 2014, 10:14:26 PM1/9/14
to django...@googlegroups.com
Thanks, Javier.

The interesting 1000 records are changing very slowly. A few records could be added/deleted every minute but mostly it's some field changes to the records in the set.

No change is more or less worthwhile than any other change. All need to get into the database "eventually". Update is not expensive but regenerating the queryset is.
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAFkDaoQf0PDDAMTjCzUp92jHk%2BS34CDcMt9iJS4uRTCJf%2BLQKw%40mail.gmail.com.

Erik Cederstrand

unread,
Jan 9, 2014, 10:20:06 PM1/9/14
to Django Users
Den 09/01/2014 kl. 22.43 skrev Sells, Fred <fred....@adventistcare.org>:

> Fred: I was unclear. The initial join is 10000 x 1000 x 20 x 100 ... which gets reduced to about 1000 records.

This still shouldn’t be a problem as long as the database has proper indexes. Is the reduction to (at most) 25 records something you *must* do in Django, or could you add additional filters to the queryset, to get only the rows you want?

> Fred: I suspect (but have not confirmed) that the problem in connection pooling (i.e. lack thereof). Some requests timeout and blank out the web page on the terminal

Ok. Does it work if you have only 5 or 10 simultaneous terminals instead of 100?

Here are some random suggestions:

Test persistent connections and connection pooling, as you said.

Check that your web server is actually configured to handle 100 simultaneous connections.

Get the exact SQL query, log into your database and look at the execution plan: "EXPLAIN SELECT blablabla”. Does it do full table scans or other nasty stuff? Missing indexes on the columns used in the join?

Install the Django Debug Toolbar if you haven’t already, and look at the report for the request in your browser.

Erik
Reply all
Reply to author
Forward
0 new messages