MySQL data loss possibility with concurrent ManyToManyField saves

407 views
Skip to first unread message

Tim Graham

unread,
Mar 14, 2016, 11:15:31 AM3/14/16
to Django developers (Contributions to Django itself)
Could some MySQL users take a look at ticket #26347 [0] and recommend how to proceed? I think it's probably not a new issue but I'm a bit surprised it hasn't come up before if so.

[0] https://code.djangoproject.com/ticket/26347

Cristiano Coelho

unread,
Mar 14, 2016, 6:48:02 PM3/14/16
to Django developers (Contributions to Django itself)
The django-admin interface is quite bad at handling concurrent modifications, this is one problem that might not happen on other backends and is quite critical, but other issues (that ain't critical like data loss but might cause unexpected errors) like two people viewing a model with some inline relations, one of them deletes one of the inline rows, the other one performs a save, it will try to re-save the one that's actually deleted and you end up with a "please correct the errors below" but you actually see no errors (because the row you tried to save no longer exists and is no longer loaded). Similar errors happen with other kind of concurrent modifications.

As of the MySQL issue, using a higher isolation level might fix the issue but I would strongly recommend not changing it as this is mainly an issue of how django performs the many-to-many save and not MySQL's fault at all. If you still go this path, it would be a good idea to have the isolation level change not set globally (on settings) but have it as an argument to the transaction.atomic() function to be able to change it per transaction.

Would it work if rather than performing a DELETE, SELECT, INSERT combo, you perform a DELETE and INSERT IGNORE ? I believe it is supported by MySQL and PostgreSQL has a similar functionallity starting on 9.5 but can be emulated through locks or nested queries.

What if there could be a way to store the last modified date of an object (check it's history maybe?), save that date locally on the admin form, and when performing the actual save compare those two and prompt an error if they do not match, that would prevent most concurrent modification issues (although make it a bit more anoying for edits that usually do not have any issue), although you might still be prone to a small race condition on the modification date, unless yo uperform a select for update on it first.

Shai Berger

unread,
Mar 14, 2016, 7:12:29 PM3/14/16
to django-d...@googlegroups.com
Hi,

I just commented on the ticket, but wanted to clarify a few things here:

On Tuesday 15 March 2016 00:48:02 Cristiano Coelho wrote:
> The django-admin interface is quite bad at handling concurrent
> modifications, this is one problem that might not happen on other backends
> and is quite critical, but other issues (that ain't critical like data loss
> but might cause unexpected errors) like two people viewing a model with
> some inline relations, one of them deletes one of the inline rows, the
> other one performs a save, it will try to re-save the one that's actually
> deleted and you end up with a "please correct the errors below" but you
> actually see no errors (because the row you tried to save no longer exists
> and is no longer loaded). Similar errors happen with other kind of
> concurrent modifications.

Right.

> As of the MySQL issue, using a higher isolation level might fix the issue

For the record, the suggestion made was to use a *lower* isolation level, to
prevent the failure of transactions; I disagree with that recommendation, as
it "prevents" the failure by enabling the kinds of data corruption you
described above.

> but I would strongly recommend not changing it as this is mainly an issue
> of how django performs the many-to-many save and not MySQL's fault at all.

... and there I completely disagree. Django handles the updates in a
transaction; first it deletes records, then (if they don't exist) writes them.
The MySql docs explain that records are "snapshotted" at the beginning of the
transaction, and so may actually be stale later on in it -- but that does not
hold when the changes to the records are done *in* the transaction.
I asked the ticket's OP to describe a scenario where this is anything but a
severe MySql bug, and I ask the same of you.

Shai.

Cristiano Coelho

unread,
Mar 14, 2016, 8:39:27 PM3/14/16
to Django developers (Contributions to Django itself)
I can't tell if it is a bug on MySQL or not, but I did understand the same as you (the first example with Session A and Session B makes it more clear) so I can't quite understand how did the poster get that issue. I would like a similar example as the one in the docs, but with a delete in the middle, it is a bit unclear what happens, could a delete in the middle create a new "snapshot" for your next select?


So assuming the issue is real and not a bug in MySQL:

- Would a DELETE + INSERT IGNORE fix the issue, while also improving many-to-many updates to backends that support INSERT IGNORE? (one less select)

- Would a different isolation level help? What about SERIALIZABLE? In order to do this, the only good way to do this is to have the option to change serialization level on each transaction, improving the transaction.atomic() method so we do not affect other db operations changing the isolation level globally.

- Force a lock to the many-to-many updates (using select for update maybe?) ?

- Use the object history to perform date matching and detect concurrent modifications of the same object before we do them and raise an error?

Karen Tracey

unread,
Mar 18, 2016, 5:15:59 PM3/18/16
to django-d...@googlegroups.com
This is the 2nd major issue I can recall caused by MySQL default of REPEATABLE READ transaction isolation level. I think Django should simply switch itself to a default of using READ COMMITTED, consistent with (all?) other supported database backends, and document how, if a user really really wants to use REPEATABLE READ, they can do so (I assume Django could allow that?), and what known problems when using basic Django functions they may run into if they do so.

I fear our existing approach of documenting how certain functions don't work by default on MySQL (e.g. get_or_create) is not really helping the majority of our users. I believe switching instead to making Django code itself work by default on MySQL would be a better long-term solution for those who use MySQL with Django, and avoid future cases like this one that has been discovered (years after we knew get_or_create was broken by default transaction isolation level on MySQL).

On Mon, Mar 14, 2016 at 11:15 AM, Tim Graham <timog...@gmail.com> wrote:
Could some MySQL users take a look at ticket #26347 [0] and recommend how to proceed? I think it's probably not a new issue but I'm a bit surprised it hasn't come up before if so.

[0] https://code.djangoproject.com/ticket/26347

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/286b0efb-673f-42d7-a1f3-5de76fc039c5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aymeric Augustin

unread,
Mar 20, 2016, 6:14:15 PM3/20/16
to django-d...@googlegroups.com
I agree with Karen.

-- 
Aymeric.

Curtis Maloney

unread,
Mar 20, 2016, 7:24:31 PM3/20/16
to django-d...@googlegroups.com
+1 for me, too... this aligns with "safe/secure by default".

--
C

On 21/03/16 09:13, Aymeric Augustin wrote:
> I agree with Karen.
>
> --
> Aymeric.
>
>> On 18 Mar 2016, at 22:15, Karen Tracey <kmtr...@gmail.com
>> <mailto:django-develop...@googlegroups.com>.
>> To post to this group, send email to
>> django-d...@googlegroups.com
>> <mailto:django-d...@googlegroups.com>.
>> <https://groups.google.com/d/msgid/django-developers/286b0efb-673f-42d7-a1f3-5de76fc039c5%40googlegroups.com?utm_medium=email&utm_source=footer>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to django-develop...@googlegroups.com
>> <mailto:django-develop...@googlegroups.com>.
>> To post to this group, send email to
>> django-d...@googlegroups.com
>> <mailto:django-d...@googlegroups.com>.
>> Visit this group at https://groups.google.com/group/django-developers.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/django-developers/CACS9rae4U0e80-h%3DesTXFUi%3DLxWQ-XiMAp%3DAdkXcR0FnJVT2Cg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/django-developers/CACS9rae4U0e80-h%3DesTXFUi%3DLxWQ-XiMAp%3DAdkXcR0FnJVT2Cg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-develop...@googlegroups.com
> <mailto:django-develop...@googlegroups.com>.
> To post to this group, send email to django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/176B63AA-95D8-4D8B-B8EB-6A0AF95EEB9B%40polytechnique.org
> <https://groups.google.com/d/msgid/django-developers/176B63AA-95D8-4D8B-B8EB-6A0AF95EEB9B%40polytechnique.org?utm_medium=email&utm_source=footer>.

Cristiano Coelho

unread,
Mar 20, 2016, 8:25:37 PM3/20/16
to Django developers (Contributions to Django itself)
What performance changes can you expect doing this change? It is probably that default on MySQL for a good reason.

Shai Berger

unread,
Mar 21, 2016, 3:27:59 AM3/21/16
to django-d...@googlegroups.com
First of all, I would like to say that I strongly support the move to READ
COMITTED, including backporting it to 1.8.x.

But we also need to explain: REPEATABLE READ is a higher transaction isolation
level than READ COMMITTED. If you have problematic code, it should lead to
more deadlocks and/or transactions failing at commit time (compared to READ
COMMITTED), not to data loss. The reason we get data losses is MySql's unique
interpretation of REPEATABLE READ. If you're interested in the details (and if
you use MySql, you should be), read on.

With MySql's REPEATABLE READ, the "read" operations -- SELECT statements --
indeed act like they act in the usual REPEATABLE READ: Once you've read some
table, changes made to that table by other transactions will not be visible
within your transaction. But "write" operations -- UPDATE, DELETE, INSERT and
the like -- act as if they're under READ COMMITTED, affecting (and affected by)
changes committed by other transactions. The result is, essentially, that
within a transaction, the reads are not guaranteed to be consistent with the
writes [1].

In particular, in the bug[2] that caused this discussion, we get the following
behavior in one transaction:

(1) BEGIN TRANSACTION

(2) SELECT ... FROM some_table WHERE some_field=some_value
(1 row returned)

(3) (some other transactions commit)

(4) SELECT ... FROM some_table WHERE some_field=some_value
(1 row returned, same as above)

(5) DELETE some_table WHERE some_field=some_value
(answer: 1 row deleted)

(6) SELECT ... FROM some_table WHERE some_field=some_value
(1 row returned, same as above)

(7) COMMIT
(the row that was returned earlier is no longer in the database)

Take a minute to read this. Up to step (5), everything is as you would expect;
you should find steps (6) and (7) quite surprising.

This happens because the other transactions in (3) deleted the row that is
returned in (2), (4) & (6), and inserted another one where
some_field=some_value; that other row is the row that was deleted in (5). The
row that this transaction selects was not seen by the DELETE, and hence not
changed by it, and hence continues to be visible by the SELECTs in our
transaction. But when we commit, the row (which has been deleted) no longer
exists.

I have expressed elsewhere my opinion of this behavior as a general database
feature, and feel no need to repeat it here; but I think that, if possible, it
is Django's job as a framework to protect its users from it, at least as a
default.

On Monday 21 March 2016 02:25:37 Cristiano Coelho wrote:
> What performance changes can you expect doing this change? It is probably
> that default on MySQL for a good reason.

The Django project is usually willing to give up quite a lot of performance in
order to prevent data losses. I agree that this default on MySql is probably
for a reason, but I don't think it can be a good reason for Django.

Have fun,
Shai.

[1] https://dev.mysql.com/doc/refman/5.7/en/innodb-consistent-read.html
[2] https://code.djangoproject.com/ticket/26347

Anssi Kääriäinen

unread,
Mar 21, 2016, 3:59:37 AM3/21/16
to django-d...@googlegroups.com
I'm strongly -1 on changing the default isolation level in a minor
release. We can recommend users switch the level and complain loudly
if they don't. But just changing the isolation level has potential for
breaking working code.

- Anssi

Aymeric Augustin

unread,
Mar 21, 2016, 3:59:39 AM3/21/16
to django-d...@googlegroups.com
On 21 Mar 2016, at 01:25, Cristiano Coelho <cristia...@gmail.com> wrote:
>
> What performance changes can you expect doing this change? It is probably that default on MySQL for a good reason.

Barring implementation weirdness, reducing the isolation level is supposed to improve performance (if it makes a difference at all).

--
Aymeric.

Shai Berger

unread,
Mar 21, 2016, 4:41:00 AM3/21/16
to django-d...@googlegroups.com
My recommendation to backport is based on the observation that the peculiar REPEATABLE READ behavior is highly conductive to data loss in the presence of concurrency, combined with a sense that it is not very well known; I find it much more likely that the change will fix broken code than break really working code.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Anssi Kääriäinen

unread,
Mar 21, 2016, 7:33:51 AM3/21/16
to django-d...@googlegroups.com
This being MySQL I wouldn't be surprised if changing the isolation
level would introduce new problems. Also, user code might rely on
Django using repeatable read. If we introduce the change in to stable
releases, we risk breaking sites that work perfectly well currently.
To me this is against our backwards compatibility rules which state
that we avoid doing such changes except when completely unavoidable.

- Anssi
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/2E603EC4-6BAF-4388-8073-366898B25AAF%40platonix.com.

Aymeric Augustin

unread,
Mar 21, 2016, 7:39:08 AM3/21/16
to django-d...@googlegroups.com
We’ve had, and known about, such problems for so long that I’m also
inclined not to make that change in already released versions.

--
Aymeric.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CALMtK1FTe4K6VPg2QZbdafQGGyBq-dc_rQ79OwuD%3D946o-%2BpbA%40mail.gmail.com.

hcha...@medici.tv

unread,
Mar 21, 2016, 7:55:38 AM3/21/16
to Django developers (Contributions to Django itself)
Hi,

I'm the ticket OP.

As a user, I definitely expect that my ORM won't eat my records when I save something, even with concurrency. Furthermore, I expect it to work with the default configuration of the backend, or if I have to change it, the ORM should refuse to work until the configuration is right, or at least display a huge warning.

I agree with Karen that documenting that some functions don't work is not helping. Especially since this is Django's fault, not MySQL's. I mean, sure, MySQL has some very weird isolation modes, I definitely agree with Shai on that, but an ORM's job is to know all that, and to use the database in such ways that it is safe. I reckon those safe ways exist, and that the problem in Django is not the MySQL default isolation level, but the DELETE-SELECT-INSERT way of updating M2Ms. Incidentally, to prove my point, this has been changed in Django 1.9 and data-loss doesn't happen anymore, in that same default isolation level.

It seems that setting a READ COMMITTED default level would help in that particular schedule of operations that is described in the ticket, but I disagree that it will work always and not cause problems elsewhere. For example, given a different schedule of the same operations needed to update a M2M, (DELETE-SELECT-INSERT), under READ COMMITTED, data-loss could still happen (see attachment).

Granted, this would be less of a problem, because it would (I think) require a very close synchronicity of the two sets of operations. This requirement is not needed in the ticket's case. If we refer to the step-by-step in the ticket, this isn't evident from it, but between the two SELECTs in sessions A and B, and the DELETE in session A, a "great" deal of time happens. This is because (if I understand correctly) a transaction is started to update each of the model's fields, and that is that single transaction that includes the update to the M2M field. Thus, the requirement is only that those two long transactions are synchronized.

By a matter of fact, this means that the bigger the model, the easier it will be to trigger the bug. I tried to replicate it in a minimal toy project, and I couldn't have enough synchronicity to replicate it (I only tried by double-clicking fast on the save button in the admin site, though). On my production project, I can reproduce the bug every time.

In my opinion, the fix should not be to set a different default isolation level, as that could trigger other problems that may be very hard to find. I was lucky to find this one. I think that Django should find sequences of operations that are proven to be safe under specific isolation levels. Those may need to be tailored for each backend. Maybe for PostgreSQL, that DELETE-SELECT-INSERT is safe, but it simply isn't in MySQL, neither in REPEATABLE READ nor READ COMMITTED. Thus this sequence could be used for PostgreSQL, but another one should be found for MySQL.
m2m_django_read_committed

Cristiano Coelho

unread,
Mar 21, 2016, 8:28:11 AM3/21/16
to Django developers (Contributions to Django itself)
Shai Berger, this explanation is pure gold! Definetely better than MySQL's one.

Now I may agree on that changing the isolation level should be probably left to a major release, or either have a huge warning on the patch notes. I personally can't think of any project I have done that this would be an issue since database errors are usually well handled (which I think doing this change would increase, since you will be reading more modified data from other transactions if I read correct) and critical operations should be handled with either a select for update or transactional operations (such as doing col = col + 1).

Aymeric Augustin

unread,
Mar 21, 2016, 8:35:19 AM3/21/16
to django-d...@googlegroups.com
Hello,

On 21 Mar 2016, at 12:01, hcha...@medici.tv wrote:

In my opinion, the fix should not be to set a different default isolation level, as that could trigger other problems that may be very hard to find. I was lucky to find this one. I think that Django should find sequences of operations that are proven to be safe under specific isolation levels.

The problem is to determine what “safe” means here :-) I’m afraid this won’t be possible in general (at least not in a remotely efficient fashion) because Django inevitably depends on the guarantees of the underlying database, and those guarantees are weak at lower isolation levels.

-- 
Aymeric.

Hugo Chargois

unread,
Mar 21, 2016, 10:02:07 AM3/21/16
to Django developers (Contributions to Django itself)
Le lundi 21 mars 2016 13:35:19 UTC+1, Aymeric Augustin a écrit :

The problem is to determine what “safe” means here :-) I’m afraid this won’t be possible in general (at least not in a remotely efficient fashion) because Django inevitably depends on the guarantees of the underlying database, and those guarantees are weak at lower isolation levels.

Sure, some of it is left to interpretation. There are some tricky cases, like two conflicting updates happening at the same time. Would safe mean that they should they both succeed, even if there would be a lost update? Or would it mean that one of should them fail? But here, we have "two non-conflicting updates (no-ops, even) causes data to be lost". I bet no one would call this safe.

The guarantees offered by the database may not be perfect, but my point is that those guarantees are used badly in Django. Consistent read is a "feature", it can be used to make things "safer", but in that bug, it's simply used incorrectly because of false assumptions. I reckon that given those same guarantees, and knowing how they may be different from backend to backend, they can be used to have a "safer" behavior than this.

That's also why I don't think lowering the isolation level is a good idea. The higher the isolation level, the more "guarantees" to build upon there are. This bug is an unfortunate case where an assumption that is true in a lower isolation level is not anymore in a higher one. That doesn't mean that the assumption was safe to make in the first place.

I applied the patch proposed by Shai (thanks!) in the ticket, that replaces the SELECT by a SELECT FOR UPDATE. This effectively fixes the bug. I think this is the right kind of fix. Fix the code, do not change the isolation level because it fixes the code by chance.

Aymeric Augustin

unread,
Mar 21, 2016, 1:19:02 PM3/21/16
to django-d...@googlegroups.com
On 21 Mar 2016, at 15:02, Hugo Chargois <hcha...@medici.tv> wrote:

But here, we have "two non-conflicting updates (no-ops, even) causes data to be lost". I bet no one would call this safe.

Yes, this one was clearly a bug (and, if I understand correctly, it was resolved by a not-directly-related change, originally considered a performance optimization).

What I meant, but failed to convey clearly, is: can we derive general and useful rules from this bug, or are we stuck with dealing with such problems on a case by case basis?

In theory, we could:

1. make a list of “atomic” ORM operations (create, update, delete, etc.) — things you write in one line and thus expect to complete consistently
2. figure out which operations incur multiple queries — concrete inheritance and many-to-many relationships come to mind
3. investigate what unexpected results could occur in the presence of concurrent queries — I have no idea how to do that exhaustively

Furthermore, I suspect the answer to 3. will be “anything could happen” at lower isolation levels and “that looks be safe” at higher isolation levels. I also suspect there’s little we could do to improve the situation at lower isolation levels, except if we have other instances of unoptimized M2M code. (Optimizing for performance means making fewer queries means improving chances of transactional correctness.)

In the end, “figure out where the ORM could do fewer queries” sounds like a easier plan than “figure out all possible sequences of queries and whether they’re safe”.

-- 
Aymeric.

Shai Berger

unread,
Mar 21, 2016, 7:11:31 PM3/21/16
to django-d...@googlegroups.com
Hi,

On Monday 21 March 2016 13:01:27 hcha...@medici.tv wrote:
>
> I agree with Karen that documenting that some functions don't work is not
> helping. Especially since this is Django's fault, not MySQL's. I mean,
> sure, MySQL has some very weird isolation modes, I definitely agree with
> Shai on that, but an ORM's job is to know all that, and to use the database
> in such ways that it is safe. I reckon those safe ways exist,

I disagree. The ORM cannot keep you safe against MySql's REPEATABLE READ.

> and that the
> problem in Django is not the MySQL default isolation level, but the
> DELETE-SELECT-INSERT way of updating M2Ms.

The reason for that, and the major point you seem to be missing, is that
Django 1.8's DELETE-SELECT-INSERT dance for updating M2M's is not reserved to
Django's internals; you can be quite certain that there are a *lot* of similar
examples in user code. And that user code, making assumptions that hold
everywhere except with MRR, are not buggy. It is MySql's documented behavior
that is insensible.

> Incidentally, to prove my point,
> this has been changed in Django 1.9 and data-loss doesn't happen anymore,
> in that same default isolation level.
>

That indeed seems to support your point, but it is far from proving it.

It is trivial to show that MRR simply isn't repeatable read:

create table yoyo (id int, age int, rank int);
insert yoyo values(1,2,3);

Now, using MySql's so-called REPEATABLE READ,

SESSION A SESSION B
==================================
SET autocommit=0; SET autocommit=0;

SELECT * FROM yoyo; SELECT age FROM yoyo;
(1,2,3) (2)

UPDATE yoyo set age=5;
(ok, 1 row affected)

UPDATE yoyo set rank=7;
(blocks)

COMMIT;
(ok)
(resumes)
(ok, 1 row affected)


SELECT age from yoyo;
(5)


Session B sees, in its reads, changes made by session A after B's transaction
started. This is called, in the SQL standard, "a non-repeatable read". This is
*exactly* the situation that REPEATABLE READ is required to prevent. Calling a
transaction isolation level that allows this "REPEATABLE READ" is a lie. At
the very least, MySql needs to change the name of this isolation level to
something less misleading.

> It seems that setting a READ COMMITTED default level would help in that
> particular schedule of operations that is described in the ticket, but I
> disagree that it will work always and not cause problems elsewhere. For
> example, given a different schedule of the same operations needed to update
> a M2M, (DELETE-SELECT-INSERT), under READ COMMITTED, data-loss could still
> happen (see attachment).
>

I can reproduce this. But frankly, I cannot understand it. Can you explain
what happens here? Point to some documentation? The way I understand it, the
DELETE from session B either attempts to delete the pre-existing record (in
which case, when it resumes after the lock, it should report "0 records
affected" -- that record has been already deleted) or the one inserted by
session A (in which case, the second SELECT in session B should retrieve no
records). In either case, the SELECT in session B should not be finding any
record which has already been deleted, because all deletions have either been
committed (and we are in READ COMMITTED) or have been done in its own
transaction. What is going on here?

The point of the above question, besides academic interest, is that this
behavior is also suspect of being a bug. And if it is, Django should try to
work around it and help users work around it, but not base its strategic
decisions on its existence.

>
> In my opinion, the fix should not be to set a different default isolation
> level, as that could trigger other problems that may be very hard to find.
> I was lucky to find this one. I think that Django should find sequences of
> operations that are proven to be safe under specific isolation levels.

Again: The burden of "finding operation sequences" on Django is relatively
small -- most ORM operations are single queries. Most of the burden created by
MySql's REPEATABLE READ falls on users, who should not be expected to deal
with such blatant violations of common standards and terminology.

Shai.

Hugo Chargois

unread,
Mar 22, 2016, 5:57:03 AM3/22/16
to Django developers (Contributions to Django itself)
Le mardi 22 mars 2016 00:11:31 UTC+1, Shai Berger a écrit :
> I disagree. The ORM cannot keep you safe against MySql's REPEATABLE READ.

>> Incidentally, to prove my point, 
>> this has been changed in Django 1.9 and data-loss doesn't happen anymore, 
>> in that same default isolation level. 
>> 
>
>That indeed seems to support your point, but it is far from proving it. 

I'm unsure that your relentless rant against MySQL is useful in this topic. You have to acknowledge that the data-loss behavior, as far as the bug that started this topic is concerned, is fixed in 1.9 by making better queries against the database. It is also fixed by your SELECT FOR UPDATE patch. The point I'm trying to make is not that the ORM should magically be perfect against all backend, however broken they may be, but that it should guarantee the integrity of data as much as is allowed by a intelligent use of the backend. Using SELECT FOR UPDATE or whatever is done in 1.9 is a smarter use of the MySQL backend.

What are you trying to prove here? That MySQL is broken? That may or may not be the case (probably not), but this is quite off-topic. Even if it doesn't adhere to the ISO/ANSI-standard for isolation levels behavior, there's not a lot you can do. MySQL is by far the most popular open-source DBMS. I hate it as much as you and would heartily prefer to use PostgreSQL, but that's a fact. There may be grounds for whining on MySQL bugtracker, but I'm not sure this would go anywhere. They probably won't care, by their popularity, they basically are the de facto SQL standard. And even if they care, they would be unwilling to break compatibility by introducing changes to how isolation levels work. And even if they do change the isolation levels, it would take some time for users to get that new MySQL version. In all cases, Django would still need fixing.

The reason for that, and the major point you seem to be missing, is that
Django 1.8's DELETE-SELECT-INSERT dance for updating M2M's is not reserved to
Django's internals; you can be quite certain that there are a *lot* of similar
examples in user code. And that user code, making assumptions that hold
everywhere except with MRR, are not buggy. It is MySql's documented behavior
that is insensible.

You're missing the point. First, that same "dance" in user code is nowhere near as much used as the one in Django internals. As a user, I've never ever used explicit Django transactions, at all. Have I saved models? Sure. Loads. Second, the onus is Django's ORM to get its moves right. It's on me to get mine right. I know that. I accept that. As a user, that's probably the most important reason why I'm using an ORM. Because I don't know a lot about database transactions caveats, and therefore I trust the ORM to do transactions right and not eat my data. If someday I ever need to do tricky things and I need to do transactions myself, I will take a hard look on how transactions work on my database backend, and if I get it wrong, it will be my fault.

Maybe a note on Django's "Database transactions" page would be helpful to warn users about the limitations of MySQL?

I can reproduce this. But frankly, I cannot understand it. Can you explain
what happens here? Point to some documentation? The way I understand it, the
DELETE from session B either attempts to delete the pre-existing record (in
which case, when it resumes after the lock, it should report "0 records
affected" -- that record has been already deleted) or the one inserted by
session A (in which case, the second SELECT in session B should retrieve no
records). In either case, the SELECT in session B should not be finding any
record which has already been deleted, because all deletions have either been
committed (and we are in READ COMMITTED) or have been done in its own
transaction. What is going on here?

The point of the above question, besides academic interest, is that this
behavior is also suspect of being a bug. And if it is, Django should try to
work around it and help users work around it, but not base its strategic
decisions on its existence.

Sorry, I have no idea why this example behaves like that. I tried it, thinking "that could break, maybe", and it did, and that's it.
 
Again: The burden of "finding operation sequences" on Django is relatively
small -- most ORM operations are single queries.

Great, then maybe this bug is one of the last of its kind :) And a manual search for others, by finding "maybe broken" multiple-queries operations, then testing them concurrently, then if need be optimizing/fixing them would be within reach.

Shai Berger

unread,
Mar 22, 2016, 6:33:10 PM3/22/16
to django-d...@googlegroups.com
On Tuesday 22 March 2016 11:57:03 Hugo Chargois wrote:
>
> I'm unsure that your relentless rant against MySQL is useful in this topic.

I'm sorry if anyone sees it as a rant. It is intended as a technical argument
that MySql's REPEATABLE READ is unsuitable for use as a default by Django.

>
> You're missing the point. First, that same "dance" in user code is nowhere
> near as much used as the one in Django internals. As a user, I've never
> ever used explicit Django transactions, at all. Have I saved models? Sure.
> Loads.

Most non-trivial Django projects tie transactions to requests, and don't worry
about transactions beyond that. Their authors expect basic, limited
consistency guarantees to be gained by that. It is Django's job to try, as
best as it can, to fulfill these expectations.

> Second, the onus is Django's ORM to get its moves right. It's on me
> to get mine right. I know that. I accept that. As a user, that's probably
> the most important reason why I'm using an ORM. Because I don't know a lot
> about database transactions caveats, and therefore I trust the ORM to do
> transactions right and not eat my data. If someday I ever need to do tricky
> things and I need to do transactions myself, I will take a hard look on how
> transactions work on my database backend, and if I get it wrong, it will be
> my fault.

I'll answer this argument with an analogy. Imagine that we're back in a world
where WSGI doesn't exist, and so Django needs to provide specific adapters to
webservers. Further, a specific webserver -- let's call it "MyHttp" -- became
very popular, and Django supports it. And then it turns out that, by default,
if you try to send a page longer than 16KB on HTTPS, MyHttp instead sends a
redirect to HTTP and serves the content unencrypted -- optimizing for using
less server resources. A user can prevent that by using some API to declare
their application as "sensitive". Consider also that, somehow, this
optimization is not very well known. It is explained somewhere within MyHttp's
documentation, but the generally known thing is that MyHttp supports HTTPS,
although "sometimes there are issues with that".

Would you claim that it isn't Django's place to make applications "sensitive"
by default? That if users employ MyHttp for secure Django applications, it is
their responsibility to find out all about the limitations of their web server,
and any expectations not based on reading its whole documentation are
irresponsible, and Django shouldn't care?

I, for one, expect better of Django.

Shai.

Hugo Chargois

unread,
Mar 23, 2016, 5:51:00 AM3/23/16
to Django developers (Contributions to Django itself)
Le mardi 22 mars 2016 23:33:10 UTC+1, Shai Berger a écrit :
> It is Django's job to try, as best as it can, to fulfill these expectations. 

How could I disagree? Of course, if there is one single sensible, obvious default, that would help 99.9 % of users, like your webserver analogy, it should be set.

> MySql's REPEATABLE READ is unsuitable for use as a default by Django. 

This is where things are not as obvious... MySQL's REPEATABLE READ may have it flaws, it may cause repeating bugs because that level is a bit awry, with its reads and writes that don't work the same, but all in all, it IS a higher isolation level than READ COMMITTED. It DOES provide the REPEATABLE READ guarantee (for reads) that READ COMMITTED doesn't have.

For each "DELETE-SELECT-INSERT" bug that happens "because of" REP READ, how many, I don't know, "SELECT-SELECT-again-and-not-have-the-same-results" bugs are prevented by it?

That would be hard to say for sure.

For that user-transaction behavior, I'm in favor of a documentation fix. On the "Transaction" documentation page, have a not that read something like "Each backend can be configured in multiple isolation levels, and the isolation guarantees differ by isolation level and also between backends. In particular, MySQL's REP READ is weird". Maybe offer a way to select the desired isolation level in code, for each transaction even. Caveat emptor. After all, apart from SERIALIZABLE, transactions will never be perfect.

But I still want to insist on the fact that the bug discussed in the ticket is quite independent from the choice of the default isolation level. Sure, setting a lesser default isolation level fixes it coincidentally, but it can also be fixed (in two different ways, even) with a better use of the same isolation level. It shouldn't by itself justify the change of the default isolation level.

Ben Cail

unread,
Mar 24, 2016, 4:34:16 PM3/24/16
to django-d...@googlegroups.com
What's the current status on this? Is the .for_update() change mentioned in the bug report the best way to go? Is anyone working on a PR?

Shai Berger

unread,
Mar 24, 2016, 5:04:35 PM3/24/16
to django-d...@googlegroups.com
While there are some different opinions expressed here (mine included), the
rough consensus seems to be to apply the for_update() patch to 1.8.x (it is
neither needed nor applicable to 1.9.x) and change the transaction isolation
level for 1.10 and going forward.

To the best of my knowledge, nobody is working on patches yet, I have had no
time for it myself in the last few days.

Shai Berger

unread,
Mar 24, 2016, 5:24:23 PM3/24/16
to django-d...@googlegroups.com
On Wednesday 23 March 2016 11:50:59 Hugo Chargois wrote:
>
> ... MySQL's REPEATABLE READ may have
> it flaws, it may cause repeating bugs because that level is a bit awry,
> with its reads and writes that don't work the same, but all in all, it IS a
> higher isolation level than READ COMMITTED. It DOES provide the REPEATABLE
> READ guarantee (for reads) that READ COMMITTED doesn't have.
>

As I have shown, this is only true under certain conditions.

>
> But I still want to insist on the fact that the bug discussed in the ticket
> is quite independent from the choice of the default isolation level. Sure,
> setting a lesser default isolation level fixes it coincidentally, but it
> can also be fixed (in two different ways, even) with a better use of the
> same isolation level. It shouldn't by itself justify the change of the
> default isolation level.

This is a sentiment I agree with (except for the "coincindental"
qualification). The change of isolation level is intended to solve a whole
class of bugs, most of them in user code; the bug that prompted this
discussion is merely one example, and by itself wouldn't justify the change.

Shai.
Reply all
Reply to author
Forward
0 new messages