Trac #22343 -- select-for-update in autocommit mode

Shai Berger

unread,

Mar 27, 2014, 10:20:23 AM3/27/14

to django-d...@googlegroups.com

Hi,

Ticket #22343[1] is about a couple of tests failing under Oracle. But it
raises a question that's a little deeper.

The relevant tests do a select-for-update to verify that the "for update"
clause is, indeed, added; and they happen to run in autocommit mode. They fail
because, under Oracle, in autocommit mode, the automatic commit happens
immediately after the command is executed -- and so, trying to fetch the
results fails for being done in a separate transaction.

However, with any backend, select-for-update in autocommit mode makes very
little sense. Even if it doesn't break (as it does on Oracle), it doesn't
really lock anything. So, IMO, executing a query that is a select-for-update
in autocommit mode is probably en error, and one that is likely to cause data-
corruption bugs.

So I'm suggesting we change the behavior of select-for-update queries, to
error out when executed with no transaction. This is a backwards-incompatible
change -- for some projects, app they have used happily before will suddenly
break. These projects should probably be thankful -- they were running with a
subtle bug that is now exposed -- but still.

Unless there are convincing objections to this, I'd even like to backport this
change, at least to 1.6, because of the potential data-corruption bugs.

Your opinions are welcome.

Thanks,
Shai.

[1] https://code.djangoproject.com/ticket/22343

Aymeric Augustin

unread,

Mar 27, 2014, 1:59:52 PM3/27/14

to django-d...@googlegroups.com

On 27 mars 2014, at 15:20, Shai Berger <sh...@platonix.com> wrote:

> However, with any backend, select-for-update in autocommit mode makes very
> little sense. Even if it doesn't break (as it does on Oracle), it doesn't
> really lock anything. So, IMO, executing a query that is a select-for-update
> in autocommit mode is probably en error, and one that is likely to cause data-
> corruption bugs.

I agree.

> So I'm suggesting we change the behavior of select-for-update queries, to
> error out when executed with no transaction. This is a backwards-incompatible
> change -- for some projects, app they have used happily before will suddenly
> break. These projects should probably be thankful -- they were running with a
> subtle bug that is now exposed -- but still.
>
> Unless there are convincing objections to this, I'd even like to backport this
> change, at least to 1.6, because of the potential data-corruption bugs.

Yes please.

The change should be as simple as adding:

if not self.connection.get_autocommit():
raise TransactionManagementError(“selet_for_update cannot be used outside of a transaction.”)

around line 160 in compiler.py.

--
Aymeric.

Shai Berger

unread,

Mar 30, 2014, 12:48:02 PM3/30/14

to django-d...@googlegroups.com

On Thursday 27 March 2014 19:59:52 Aymeric Augustin wrote:
> On 27 mars 2014, at 15:20, Shai Berger <sh...@platonix.com> wrote:
> > So I'm suggesting we change the behavior of select-for-update queries, to
> > error out when executed with no transaction. This is a
> > backwards-incompatible change -- for some projects, app they have used
> > happily before will suddenly break. These projects should probably be
> > thankful -- they were running with a subtle bug that is now exposed --
> > but still.
> >
> > Unless there are convincing objections to this, I'd even like to backport
> > this change, at least to 1.6, because of the potential data-corruption
> > bugs.
>
> Yes please.
>

While fixing this, I ran into this line in the current docs:
"""
Using select_for_update on backends which do not support SELECT ... FOR UPDATE
(such as SQLite) will have no effect.
"""

This raises two questions:

1) This is a little inconsistent with the approach suggested above -- on
SQLite, we allow code that asks for locks to run without them. Should we re-
think one of these decisions? That is, should we let select_for_update run
with no effect when there is no transaction, or, conversely, reject it entirely
on SQLite?

My view is that for SQLite, the guarantees of data correctness are so low
anyway, that we can allow this to pass, while on other backends the user
should expect better -- so, just do the change suggested above, in spite of
the apparent inconsistency.

2) Given the above -- should select_for_update be allowed in autocommit mode
on SQLite?

The argument here is that the most common (and only recommended) use of SQLite
with Django is for testing; disallowing select_for_update+autocommit+SQLite
may help people find problems faster. The counter-argument is that if the call
is not supposed to have any effect anyway, why bother validating that its effect
isn't nullified by the transaction mode?

Again, I tend towards the "status quo" -- let select_for_update queries be
executed in autocommit mode on SQLite. People should be testing on databases
they use for production; if all they use is SQLite, we shouldn't get in their
way, and good luck to them.

But if anybody disagrees, please speak up,

Shai.

Anssi Kääriäinen

unread,

Mar 31, 2014, 2:18:57 AM3/31/14

to django-d...@googlegroups.com

On SQLite we could also ensure we have locked the database. On SQLite
there can't be multiple writing transactions to the database, and if you
do select_for_update you are certainly going to write to the db. So the
right approach seems to be that the database is write locked already
when we do the select. There is even a dedicated RESERVED lock type for
this in SQLite. It seems the way to acquire RESERVED lock is to do a
no-effect UPDATE, that is "UPDATE thetable set pk = 0 where 1 = 0"
should give RESERVED lock to the database according to section 7.0 of
SQLite's docs [https://www.sqlite.org/lockingv3.html].

- Anssi

Shai Berger

unread,

Apr 2, 2014, 5:18:08 AM4/2/14

to django-d...@googlegroups.com

I've thought a lot about this -- I think the right approach is to add "support
select-for-update on sqlite" as a new feature in 1.8, and with it the
enforcement that select-for-update is not done in autocommit; but keep things
as they are (on Sqlite, not generally) for older versions.

Arguments for other positions still welcome,

Shai.

Marti Raudsepp

unread,

Apr 22, 2014, 11:37:37 AM4/22/14

to Django developers

On Thu, Mar 27, 2014 at 4:20 PM, Shai Berger <sh...@platonix.com> wrote:
> The relevant tests do a select-for-update to verify that the "for update"
> clause is, indeed, added; and they happen to run in autocommit mode. They fail
> because, under Oracle, in autocommit mode, the automatic commit happens
> immediately after the command is executed -- and so, trying to fetch the
> results fails for being done in a separate transaction.

Yes, please!

While we're already breaking backwards compatibility of
select_for_update() then I have another wish that would make it more
useful: a plain "FOR UPDATE" locks rows from all referenced tables in
the FROM/JOIN clauses. This has bad interactions with
select_related().

It's a bit of a mess, so not sure if it's worth solving. But I wanted
to get the issues out there for discussion.

1. In my view, select_related is mostly a hint to Django "I'm going to
need data from these tables too". I find it counter-intuitive that it
currently changes the behavior to lock all related rows too. For
example, consider:
Author.objects.select_for_update().select_related('country').filter(...)

Adding the select_related causes countries to be locked too -- which
unnecessarily limits concurrency and increases risk of deadlocks. The
number of countries is probably much more limited than authors.

2. In PostgreSQL, FOR UPDATE fails on tables that are LEFT JOINed with
error "FOR UPDATE cannot be applied to the nullable side of an outer
join"
So if the Author.country foreign key is nullable, the above example
always fails. Oracle and MySQL ignore these, so it's kind of a
compatibility issue.

Solutions
=========

A: If the database supports it, select_for_update could by default
only lock the base model and not select_related joins. Maybe it could
take a list of columns/tables to lock as well, not sure how useful
that would be.

B: Write a special case for PostgreSQL to skip locking of LEFT JOINed
tables. This behavior seems inconsistent, though.

Database support
================

SQLite: As I understand it doesn't support FOR UPDATE at all.

PostgreSQL: Supports FOR UPDATE OF table_alias [, table_alias ...] to
manually specify which table rows to lock. Fails with LEFT JOINs.

Oracle: Supports FOR UPDATE OF column_reference [, column_reference
...]; this is actually SQL standard syntax. I understand that we could
list the primary key column to force it to lock that row in a certain
table. Might not be entirely future-proof, but don't think any RDBMS
has plans for column-granularity locking.

MySQL: Always locks all FROM/JOIN table references, lacks support for
UPDATE OF entirely.

Regards,
Marti

Shai Berger

unread,

Apr 23, 2014, 10:43:42 AM4/23/14

to django-d...@googlegroups.com

Hi Marti,

On Tuesday 22 April 2014 18:37:37 Marti Raudsepp wrote:
> On Thu, Mar 27, 2014 at 4:20 PM, Shai Berger <sh...@platonix.com> wrote:
> > The relevant tests do a select-for-update to verify that the "for update"
> > clause is, indeed, added; and they happen to run in autocommit mode. They
> > fail because, under Oracle, in autocommit mode, the automatic commit
> > happens immediately after the command is executed -- and so, trying to
> > fetch the results fails for being done in a separate transaction.
>
> Yes, please!
>

As you might have noticed, this fix is already included in Django 1.6.3 & 1.7b2
(released yesterday) and in current master.

> While we're already breaking backwards compatibility of
> select_for_update() then I have another wish that would make it more
> useful: a plain "FOR UPDATE" locks rows from all referenced tables in
> the FROM/JOIN clauses. This has bad interactions with
> select_related().
>

We are limiting the breaking of backwards compatibility as much as we can.

(below reordered a bit)

>
> 1. In my view, select_related is mostly a hint to Django "I'm going to
> need data from these tables too". I find it counter-intuitive that it
> currently changes the behavior to lock all related rows too. For
> example, consider:
> Author.objects.select_for_update().select_related('country').filter(...)
>
> Adding the select_related causes countries to be locked too -- which
> unnecessarily limits concurrency and increases risk of deadlocks. The
> number of countries is probably much more limited than authors.
>

> Solution

> ========
>
> A: If the database supports it, select_for_update could by default
> only lock the base model and not select_related joins. Maybe it could
> take a list of columns/tables to lock as well, not sure how useful
> that would be.
>

In order to change the default behavior, we'd need a strong concensus around
your intuition; one which I am not at all sure exists. On the contrary, my
intuition is that people using select_related are generally aware of the form
of generated SQL, and should expect locking as the current code does.

Even if we had this concensus, we would probably be very hesitant about
breaking backwards compatibility. We only did so in the case of #22343 because
we had a strong suspicion (which has since been confirmed, privately) that
previous behavior was leading users to create data-corruption bugs.

> 2. In PostgreSQL, FOR UPDATE fails on tables that are LEFT JOINed with
> error "FOR UPDATE cannot be applied to the nullable side of an outer
> join"
> So if the Author.country foreign key is nullable, the above example
> always fails. Oracle and MySQL ignore these, so it's kind of a
> compatibility issue.
>
> Solution

> ========

>
> B: Write a special case for PostgreSQL to skip locking of LEFT JOINed
> tables. This behavior seems inconsistent, though.
>

I don't think skipping locks on PostgreSQL only is going to fly, either.
However,based on your report below, I think it might make sense to add an
argument to select_for_update() to specify (via relations, as in
select_related) the tables to avoid locking. Something along the lines of,

select_for_update(nowait=False, nolock=None)

and in your example above,

Author.objects.select_related('country').select_for_update(nolock=['country'])

Based on your review of support in backends, this could be implemented by
Postgres and Oracle; MySQL will not support this, just like it currently fails
to support the nowait argument.

Also note: Earlier in this thread, Anssi suggested a way to implement
select_for_update() for SQLite (locking the entire database -- that's what
SQLite does for every write). This may be added in the future -- of course,
implementing neither the existing nowait flag nor the proposed nolock.

Shai.

Marti Raudsepp

unread,

Apr 24, 2014, 7:21:33 AM4/24/14

to Django developers

On Wed, Apr 23, 2014 at 5:43 PM, Shai Berger <sh...@platonix.com> wrote:
> We are limiting the breaking of backwards compatibility as much as we can.

Fair enough.

> On the contrary, my
> intuition is that people using select_related are generally aware of the form
> of generated SQL, and should expect locking as the current code does.

I agree, they are probably aware. select_related was perhaps a bad
example, filters on related tables introduce JOINs too. I didn't make
a strong case for this, but I'd argue that the current behavior is
simply not useful. This is quite silly for instance:

a = Authors.objects.select_for_update()
if country_filter:
a = a.filter(country__code=country_filter)

So countries are locked if country_filter is present, but otherwise
not? I have a hard time imagining why anyone would want this behavior.

To the contrary, you usually have a limited set of tables that you
want to update. If you select a set of authors from the database,
you're probably not doing that because you want to update their
countries. You want to work with authors themselves, or lock them for
mutual exclusion to change their books, but not the other way.

If you wanted to update their countries, you'd instead write a query like:
Country.objects.filter(author__...).distinct()

I'm not just hand-waving, this is from practical experience with using
select_for_update() for mutual exclusion.

But I'm willing to accept that the default behavior is not going to change.

> I think it might make sense to add an
> argument to select_for_update() to specify (via relations, as in
> select_related) the tables to avoid locking. Something along the lines of,
>
> select_for_update(nowait=False, nolock=None)

> Author.objects.select_related('country').select_for_update(nolock=['country'])

In general I agree, but it would be a lot more useful to have a
parameter explicitly specifying the tables that I *do* want locked,
rather than the inverse -- per the rationale above. And this would
also allow doing select_for_update(lock=['country']) when there is no
select_related('country') or filter in the queryset, leading to more
predictable behavior.

But this brings the question, how do I specify that I want to lock
Author? select_for_update(lock=['']) seems like the most obvious
answer, but it looks odd.

Regards,
Marti

Reply all

Reply to author

Forward