formset concurrency control

68 views
Skip to first unread message

Carsten Fuchs

unread,
Oct 8, 2015, 9:54:36 AM10/8/15
to django...@googlegroups.com
Dear Django fellows,

as far as I understand this, there seem to be two kinds of concurrency control:

- the one that occurs between request and save, as addressed by e.g. [1] and [2],
- the one that occurs between GET request and POST request, especially with formsets.

I'm currently trying to understand the latter (apparently even the Django Admin suffers
from this problem, https://code.djangoproject.com/ticket/11313).

What I am wondering is, when Django formsets are used, what is the canonical way to
address this problem?

It seems that each form in the formset must be given the PK of the object that it is
related to, but I don't think that that is sufficient:
If the formset in the GET request is constructed from a queryset of e.g. a list of
persons, ordered alphabetically by name, at the time of the POST request persons may
have been added or deleted, causing discrepancies.
Thus, each form in the formset must be given the PK, but we must *also* construct the
original queryset in the POST request, then compare these two.

Right?

Best regards,
Carsten


PS: This is (ttbomk) also very much related to protecting against erroneous or tampered
POST requests, e.g. added or removed forms in the formset – the solution seems to be the
same. I've described this in more detail at
https://groups.google.com/d/msg/django-users/jA2KUdp1MUE/pceQZPYHBgAJ – any help would
be very much appreciated.


[1] https://github.com/saxix/django-concurrency
[2] https://github.com/gavinwahl/django-optimistic-lock

Daniel Roseman

unread,
Oct 8, 2015, 12:48:34 PM10/8/15
to Django users
Can you explain further why you think the pk is not sufficient? Ordering by name, or adding and removing entities, does not change the pk of other entities; indeed that's the whole point of a pk. What exactly are you concerned about?
--
DR.

Tim Graham

unread,
Oct 8, 2015, 3:03:29 PM10/8/15
to Django users
I think the problem is also described in https://code.djangoproject.com/ticket/15574. Probably if we had a simple solution, that ticket wouldn't be open for 5 years. :-)

Carsten Fuchs

unread,
Oct 8, 2015, 4:10:39 PM10/8/15
to django...@googlegroups.com
Hi Daniel,
Please consider this example:


from django.db import models

class Calendar(models.Model):
pass

class CalendarEntry(models.Model):
cal = models.ForeignKey(Calendar)
ref_date = models.DateField()
entry = models.CharField(max_length=40, blank=True)

class Meta:
unique_together = ('cal', 'ref_date')


We would like to present the user a formset (for a specific Calendar instance) where
e.g. October 2015 is shown in tabular form: each day of the month is shown with a static
date string, and next to it is an input field for the "entry" text.

The key issue is of course that for days that don't have an entry, no CalenderEntry
instance exists.

Now, the GET request is relatively easily dealt with (as explained in the other thread
https://groups.google.com/d/msg/django-users/jA2KUdp1MUE/pceQZPYHBgAJ, I intentionally
use a plain Form, not a ModelForm, both for better understanding and, to me, unclear
performance implications):


from django import forms

class CalendarEntryForm(forms.Form):
CalEntr_id = forms.IntegerField(required=False, widget=forms.HiddenInput())
ref_date = forms.DateField(widget=forms.HiddenInput())
entry = forms.CharField(required=False, max_length=40)


The form's ref_date member is needed for all cases where a CalenderEntry instance does
not (yet) exist. In the view's part where the GET request is processed, the formset
would be constructed e.g. like this:


# Example is specific to October:
from datetime import date
october_inits = [{"ref_date": date(2015, 10, i + 1)} for i in range(31)]

for ce in CalendarEntry.objects.filter(cal=MyCal, ref_date__year=2015,
ref_date__month=10):
init = october_inits[ce.ref_date.day - 1]

init["CalEntr_id"] = ce.id
init["entry"] = ce.entry

CalenderEntryFormSet = formset_factory(CalenderEntryForm)
formset = CalendarEntryFormSet(initial=october_inits)


So far, all is good, but problems start to occur when we get the form back in the POST
request. When comparing the data from the POST request to those of a newly created
october_inits list, CalenderEntry instances mentioned in the POST request may have been
deleted in the meanwhile or been replaced by an entirely different CalenderEntry
instance. On other days on which the POST data assumed that a CalenderEntry did not yet
exist, an instance may be existing now, etc.

Obviously, the number of days in October is fixed at 31, but if this was something else,
such as a list of persons to which CalenderEntry objects are attached, more or fewer
persons could exist when the POST request is processed.

While some of these cases can be detected with the pk alone, and others possibly with
the help of the unique-constraint and the related exception, I'm not sure if this can
really cover all the cases.

For example, if someone tampered with the POST request and submitted a wrong pk – from
the pk and the related instance alone we can probably not learn that there is a problem,
and what it is.

Well, I readily admit that this hits the limits of my skills and thoroughly confuses me,
thus my question how all this is best or typically dealt with. :-)

Best regards,
Carsten

Carsten Fuchs

unread,
Oct 9, 2015, 11:16:54 AM10/9/15
to django...@googlegroups.com
Hi Tim,

Am 08.10.2015 um 21:03 schrieb Tim Graham:
> I think the problem is also described in https://code.djangoproject.com/ticket/15574.
> Probably if we had a simple solution, that ticket wouldn't be open for 5 years. :-)

:-)

Yes, having read all of it, I too think that #15574 describes the same problem.

The more I think about it, reconstructing the queryset in the POST request by the same
parameters that were used to construct the queryset for the initial formset data in the
GET request, in *addition* to the primary keys in the POST request, seems like an
increasingly good idea to me:

In formset validation, comparing the queryset objects with the PKs from the POST request
should yield an exact match (validation successful).

If there is any mismatch, it is probably best to just tell the user that there was a
problem (in some specific cases it might be possible to fix and/or communicate the
mismatch, but not generally). He/she may lose some work (has to reload and fill in the
entire form again), but that still seems a lot better to me than saving something that
the user has not seen (or seen differently) and may come entirely surprising.

This approach looks reasonably simple and stable to me, and I'll definitively try it
out. As a second but independent step, it should even be relatively easy to add
fine-grained concurrency control to it as mentioned in my first post.

As always, any additional comments or thoughts would very much be appreciated!

Best regards,
Carsten

Carsten Fuchs

unread,
Oct 29, 2015, 12:22:42 PM10/29/15
to django...@googlegroups.com
Hi all,

continuing my previous post, I've implemented most of the ideas mentioned in this thread
now, and would like, for completeness and future reference, add some related findings
and thoughts:

As mentioned in my previous post, reconstructing the formset's initial data (the
queryset) in the POST request by the same parameters that were used to construct the
queryset for the initial formset data in the GET request, in *addition* to the primary
keys received in the POST request, turned out to work very well. That is, the code is
roughly like this:

# Inits is made from the view's appropriate queryset.
Inits = ...

# In the case of the GET request:
formset = TestFormSet(initial=Inits)

# In the case of the POST request:
formset = TestFormSet(request.POST, initial=Inits)


This is how the initial data is useful in the POST request:

- It is required for validation in order to be able to detect "high-level" kinds of
concurrency related mismatches (number of instances changed, deleted instances,
unexpected new or replaced instances).

- We also need the same data for "low-level" concurrency control, e.g. checking a
`VERSION` number as e.g. in [1] and [2].

- As a side effect of the same validation steps, tampering with the POST data is
detected as well.

- The initial data (`Inits` above) is also a convenient place for storing arbitrary
extra data, e.g. the actual model instances, that can be used for rendering additional
information in the template.

- In the case of successful formset validation, the readily available queryset
instances can be used for storing and saving the submitted data.


I still use custom Forms for all this, not ModelForms, because:

- As we need the model instances anyway and already got them via the queryset as
described above, using a ModelForm(-set) that implicitly instantiates them all again
would not be efficient.

- I have a lot of cases where I have a field flagged as "required" in the model, but
"not required" in the form. This helps with making entire forms optional in the formset
(won't get saved if not filled out), but makes clear what is required when a model is
eventually saved. This is especially helpful when forms are rendered for which not
necessarily a model instance exists and is not necessarily created, as e.g. in my
CalenderEntry example described in another post of this thread ([3]).

- Foreign keys are problematic. Although a ModelForm covers a FK just as it covers
the model's PK, if we wish to extend the concept of validating `VERSION` numbers (as
above) also to the related models, the form must be augmented with an appropriate
version number field for each FK. (I have not checked, but adding such extra fields is
probably possible with ModelForms as well.)


Well, so far my findings. Although this seems to work well, I'd very much appreciate any
further comments and thoughts.

Best regards,
Carsten
[3] https://groups.google.com/d/msg/django-users/R7wJBTlC8ZM/N3dNlMrGCwAJ


Reply all
Reply to author
Forward
0 new messages