Single SQL update to multiple values instead of multiple sequential queries

134 views
Skip to first unread message

Anshuman Aggarwal

unread,
Oct 18, 2014, 2:05:04 AM10/18/14
to django-d...@googlegroups.com
Please see this enhancement request:

Unlike what Russ has suggested, I'm pretty sure that a single UPDATE query with a large number (Ks/Ms) of updates will be significantly faster than doing multiple SQL UPDATE queries. If more people on the list feel this is not going to be the case, I will happily run a test against Postgresql and confirm the results either way. 

Assuming however that the performance benefit is significant, should we look at contributing a patch? If so, what would be the API for the same?

Use case:
For each row in a table we send requests to a server. We get individual updates from the server informing us of the status of the request. Each update corresponds to a row in a table. We want to store the datetime of the update but do not wish to hit the database everytime (we have seen performance impact since the table is huge). We memcache to batch the updates and do a single Django ORM .update() call which works well but updates all rows to a common datetime. Ideally, we wish to update each row with its own datetime of receipt of request.

Also, if any django/postgres experts can advise of a way to do large number of updates concurrently on a table or a better design for this , would like to hear suggestions but can move that over to the Django Users mailing list.

Russell Keith-Magee

unread,
Oct 18, 2014, 5:00:47 AM10/18/14
to Django Developers
Hi Anshuman,

On Sat, Oct 18, 2014 at 2:05 PM, Anshuman Aggarwal <anshuman...@gmail.com> wrote:
Please see this enhancement request:

Unlike what Russ has suggested, I'm pretty sure that a single UPDATE query with a large number (Ks/Ms) of updates will be significantly faster than doing multiple SQL UPDATE queries. If more people on the list feel this is not going to be the case, I will happily run a test against Postgresql and confirm the results either way. 

Data trumps everything. I'll immediately stand down from my objections in the ticket if you can demonstrate there's a significant performance benefit.

One caveat to keep in mind in your tests - make sure you take into account transactions. Multiple single statements in PostgreSQL will be interpreted as multiple individual transactions. In order to get a fair performance comparison, you should be comparing a "single statement" update against multiple updates *in the same transaction*. Of course, having performance data for all three approaches (single statement, multiple statement inside transaction, multiple statement outside transaction) can't hurt. Data for other databases would also be nice.
 
Assuming however that the performance benefit is significant, should we look at contributing a patch? If so, what would be the API for the same?

Use case:
For each row in a table we send requests to a server. We get individual updates from the server informing us of the status of the request. Each update corresponds to a row in a table. We want to store the datetime of the update but do not wish to hit the database everytime (we have seen performance impact since the table is huge). We memcache to batch the updates and do a single Django ORM .update() call which works well but updates all rows to a common datetime. Ideally, we wish to update each row with its own datetime of receipt of request.

Also, if any django/postgres experts can advise of a way to do large number of updates concurrently on a table or a better design for this , would like to hear suggestions but can move that over to the Django Users mailing list.

Logging to the database is almost always the Wrong Thing To Do, unless you're seeing very low traffic, or you've got a separate database for logging. Depending on what you're doing with the logged data, there are much better options for logging, including:

 * persistent in-memory stores (like Redis)
 * analytics data stores (like ElasticSearch)
 * syslog with good analysis tools

What will be the right approach for you will depend a little on what you're hoping to do with the data you're gathering.

Yours,
Russ Magee %-)

Reply all
Reply to author
Forward
0 new messages