Eliminating inter-request race conditions

111 views
Skip to first unread message

Nick Farrell

unread,
Dec 27, 2021, 8:59:55 AM12/27/21
to Django users
Hi all.

I've been using Django for quite a number of years now, in various ways. Most of the time, I find myself needing to create custom solutions to solve what appears to be a very common problem. 

During the Christmas downtime, I decided to scratch this itch, and am putting together what will hopefully turn into a solution to what I'll describe below. I'm writing this here to get a sense of what the Django community sees in this: is this a niche problem, is it shared by a few others, or is the lack of these features a fundamental overnight in the core Django product?

The problems (from highest to lowest priority):

1) a form is rendered, the data is changed by a different task/request, then the form is submitted, overwriting the recent changes.

Whenever models can be modified by multiple users (or even the same user in different windows/tabs of their browser), this can happen. Also, if there are any background processes which can modify the data (e.g. celery, or various data synchronisation services), it's possible.
In some situations this is no big deal, as the users do not really care, or you know that the latest data would overwrite the previous data anyway. But in general, this is a major risk, particularly when dealing with any health or financial data. 

2) Not being able to safely lock a model/queryset beyond the lifetime of the request.

This is related to problem 1, and solving problem 2 may in some circumstances solve problem 1 - but not always. For example, depending on how the lock is implemented, a "rogue" task/request may bypass the locking mechanism and force a change to the underlying data. Also, if a lock is based on a session, a user may have multiple tabs open in the same browser, using the same session state (via shared cookies)

Solving this problem will reduce the chance that when a person does post a form update, that there is any conflict, meaning fewer tears.

3) Not knowing that data has changed on the server until you submit a form.

Ideally there would be a means for someone viewing/editing a form to immediately be notified if data changes on the server, obsoleting the current form. This reduces the amount of wasted time is spent completing a form which is already known to be out of sync, and will need to be redone anyway (as long as problem 1 is solved; otherwise, there'll be data loss)

4) Smarter form validation

There are three types of missing validation: 
- the first is that the default widgets do not support even very simple client-side validation. For example, a text field might need to match a regular expression. 
- the second type is an ability to provide (in the model definition) arbitrary javascript which can be executed client-side to provide richer realtime validation during data entry.
- the third type involves effectively providing provisional form data to the server, and having Django validate() the form content without actually saving the result. This would allow (for example) inter-field dependencies to be evaluated without any custom code, providing near-realtime feedback to the user that their form is invalid


The solutions
This is based on a day or so's experimentation, and I very much welcome any feedback, both in terms of the usefulness of solving these problems in general, as well as suggestion on better ways to solve the problems,  before I go too far down any rabbit holes.

Enhanced forms
- when rendering a form (using e.g. as_p()), alongside the normal INPUT DOM elements, include additional hidden fields which store a copy of each form field's initial value. 
- when a form is submitted, compare these hidden values against the current value in the database. If any of these do not match, the clean() method can raise a ValidationError, allowing the user to know what has happened, and that they will need to reload the form and try again, with the new stored values.

This solution is minimally invasive. As well as modifying as_p() and friends, a django template tag can also be exposed for those users who are rendering their forms in a different way.
Note that there is no reliance on additional attributed in the models: the CAS-like checking performed is explicitly on the rendered form fields; it does not matter if other model fields' values have changed, as someone editing the form can neither see these field values, nor will their POSTing modify these other fields' values.
(I have implemented the above already, for generic model forms using a single model)

Locking
- provide a mixin which can be used on selected models. When used, a view (usually some sort of form view) can attempt to lock() the model. If successful (because it's not currently locked to someone else), only they can perform writes to the model, until the lock expires. 
- if the lock has expired, anyone (including the user who took out an expired lock) may overate on the model instance.
- the lock can be configured to either use the standard database ORM, or redis. Redis will be more performant, but should not be a hard requirement
- there will be pain points associated with using this without the websocket solution, detailed below: there will not be a clean way to maintain the lock, if the time between consecutive requests is greater than the timeout value

Websocket
- provide a model mixin to enable websocket monitoring
- use Django Channels to expose a websocket consumer
- provide a templatetag which will include appropriate javascript into a web page to initialise the client connection (if any forms are configured to be monitored)
- when the client initialises, it detects the form fields (as per the 'Enhanced Forms' solution) and registers the model instance(s) with the server, via the websocket.
- whenever a monitored instance changes in Django, a signal is raised, pushing notifications to any clients, along with the new values
- the client can immediately compare the new instance values to the original values on the form (stored in the hidden fields) and can update the widgets directly if required (e.g. setting a CSS class to indicate the input is invalid, and updating the validation message shown alongside that.


A final aspect of the solution is the javascript widgets, but I feel my post is already about 5 times too long.

Any thoughts/comments are welcome.

Thanks.

Carsten Fuchs

unread,
Dec 28, 2021, 5:35:09 AM12/28/21
to django...@googlegroups.com
Hi Nick,

maybe this is a case for optimistic locking?
Does the thread at https://groups.google.com/d/msg/django-users/R7wJBTlC8ZM/MIzvYkWyCwAJ help?

Best regards,
Carsten


Am 27.12.21 um 06:36 schrieb Nick Farrell:
> Hi all.
>
> I've been using Django for quite a number of years now, in various ways. Most of the time, I find myself needing to create custom solutions to solve what appears to be a very common problem. 
>
> During the Christmas downtime, I decided to scratch this itch, and am putting together what will hopefully turn into a solution to what I'll describe below. I'm writing this here to get a sense of what the Django community sees in this: is this a niche problem, is it shared by a few others, or is the lack of these features a fundamental overnight in the core Django product?
>
> *The problems *(from highest to lowest priority)*:*
> *
> *
> *1)* a form is rendered, the data is changed by a different task/request, then the form is submitted, overwriting the recent changes.
>
> Whenever models can be modified by multiple users (or even the same user in different windows/tabs of their browser), this can happen. Also, if there are any background processes which can modify the data (e.g. celery, or various data synchronisation services), it's possible.
> In some situations this is no big deal, as the users do not really care, or you know that the latest data would overwrite the previous data anyway. But in general, this is a major risk, particularly when dealing with any health or financial data. 
>
> *2)* Not being able to safely lock a model/queryset beyond the lifetime of the request.
>
> This is related to problem 1, and solving problem 2 may in some circumstances solve problem 1 - but not always. For example, depending on how the lock is implemented, a "rogue" task/request may bypass the locking mechanism and force a change to the underlying data. Also, if a lock is based on a session, a user may have multiple tabs open in the same browser, using the same session state (via shared cookies)
>
> Solving this problem will reduce the chance that when a person does post a form update, that there is any conflict, meaning fewer tears.
>
> *3)* Not knowing that data has changed on the server until you submit a form.
>
> Ideally there would be a means for someone viewing/editing a form to immediately be notified if data changes on the server, obsoleting the current form. This reduces the amount of wasted time is spent completing a form which is already known to be out of sync, and will need to be redone anyway (as long as problem 1 is solved; otherwise, there'll be data loss)
>
> *4)* Smarter form validation
>
> There are three types of missing validation: 
> - the first is that the default widgets do not support even very simple client-side validation. For example, a text field might need to match a regular expression. 
> - the second type is an ability to provide (in the model definition) arbitrary javascript which can be executed client-side to provide richer realtime validation during data entry.
> - the third type involves effectively providing provisional form data to the server, and having Django validate() the form content without actually saving the result. This would allow (for example) inter-field dependencies to be evaluated without any custom code, providing near-realtime feedback to the user that their form is invalid
>
>
> *The solutions*
> This is based on a day or so's experimentation, and I very much welcome any feedback, both in terms of the usefulness of solving these problems in general, as well as suggestion on better ways to solve the problems,  before I go too far down any rabbit holes.
>
> *Enhanced forms*
> - when rendering a form (using e.g. as_p()), alongside the normal INPUT DOM elements, include additional hidden fields which store a copy of each form field's initial value. 
> - when a form is submitted, compare these hidden values against the current value in the database. If any of these do not match, the clean() method can raise a ValidationError, allowing the user to know what has happened, and that they will need to reload the form and try again, with the new stored values.
>
> This solution is minimally invasive. As well as modifying as_p() and friends, a django template tag can also be exposed for those users who are rendering their forms in a different way.
> Note that there is no reliance on additional attributed in the models: the CAS-like checking performed is explicitly on the rendered form fields; it does not matter if other model fields' values have changed, as someone editing the form can neither see these field values, nor will their POSTing modify these other fields' values.
> (I have implemented the above already, for generic model forms using a single model)
>
> *Locking*
> - provide a mixin which can be used on selected models. When used, a view (usually some sort of form view) can attempt to lock() the model. If successful (because it's not currently locked to someone else), only they can perform writes to the model, until the lock expires. 
> - if the lock has expired, anyone (including the user who took out an expired lock) may overate on the model instance.
> - the lock can be configured to either use the standard database ORM, or redis. Redis will be more performant, but should not be a hard requirement
> - there will be pain points associated with using this without the websocket solution, detailed below: there will not be a clean way to maintain the lock, if the time between consecutive requests is greater than the timeout value
>
> *Websocket*
> - provide a model mixin to enable websocket monitoring
> - use Django Channels to expose a websocket consumer
> - provide a templatetag which will include appropriate javascript into a web page to initialise the client connection (if any forms are configured to be monitored)
> - when the client initialises, it detects the form fields (as per the 'Enhanced Forms' solution) and registers the model instance(s) with the server, via the websocket.
> - whenever a monitored instance changes in Django, a signal is raised, pushing notifications to any clients, along with the new values
> - the client can immediately compare the new instance values to the original values on the form (stored in the hidden fields) and can update the widgets directly if required (e.g. setting a CSS class to indicate the input is invalid, and updating the validation message shown alongside that.
>
>
> A final aspect of the solution is the javascript widgets, but I feel my post is already about 5 times too long.
>
> Any thoughts/comments are welcome.
>
> Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com <mailto:django-users...@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/e9d6ee80-19d2-4ca2-aa1b-10daf7217182n%40googlegroups.com <https://groups.google.com/d/msgid/django-users/e9d6ee80-19d2-4ca2-aa1b-10daf7217182n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Nick Farrell

unread,
Dec 28, 2021, 5:53:22 PM12/28/21
to Django users
Thanks for the reference Carsten.

I believe the approach I am taking, regarding optimistic locking, is superior to what is proposed in that thread. Specifically:
- there is no need for a special version field to be added to the model
- because forms only update specific fields in the associated model(s), there is no point in unnecessarily invalidating the form submission if unrelated model fields have been changed in the background; if a form does not show a field to begin with, having its value change would not alter the behaviour a person would make when editing the form.

Really I think this should be a code part of Django: just as you include CSRF protection virtually everywhere, you would be doing this. It is very rare that this feature would be undesired, and many Django developers will be getting burnt by the lack of CAS-like behaviour.

I'll continue to tinker with this either way, if only to use internally, if there isn't general interest.

cheers,

Nick.

Carsten Fuchs

unread,
Dec 31, 2021, 5:08:55 AM12/31/21
to django...@googlegroups.com
Hi Nick,

I've thought a bit more about this, but it seems that my requirements are different from yours: E.g. I don't need the client-side validation and especially don't want the complexity that goes with it; on the other hand, I need something that also works with formsets, not only with individual forms.

Am 27.12.21 um 06:36 schrieb Nick Farrell:
> *The problems *(from highest to lowest priority)*:*
> [...]
> *2)* Not being able to safely lock a model/queryset beyond the lifetime of the request.

I don't quite understand this. It sounds like pessimistic locking?

> *3)* Not knowing that data has changed on the server until you submit a form.
> *4)* Smarter form validation

Well, this comes at the cost of the complexity to implement this. For me, at least at this time, the cost is much to high to consider this.

> *The solutions*
>
> *Enhanced forms*
> - when rendering a form (using e.g. as_p()), alongside the normal INPUT DOM elements, include additional hidden fields which store a copy of each form field's initial value.

I don't think that having hidden fields with the initial values is necessary: In your view, you can initialize the form like this:

if request.method == 'POST':
form = SomeForm(request.POST, initial=init)
...

Note the `initial` parameter: It is used just as it is in the GET request. This allows you to use `form.changed_data` and `form.has_changed()` in form validation.

Note that the above, i.e. reconstructing the initial values also for POST requests, is useful both for individual forms and even more with multiple forms, i.e. formsets: If for example you edit a list (formset) of appointments, in formset validation you must make sure that the list of appointments has not changed in the meantime (e.g. appointments were not inserted, replaced or deleted).

Best regards,
Carsten

Nick Farrell

unread,
Dec 31, 2021, 5:31:34 AM12/31/21
to Django users
Good to hear from you Carsten. Thanks in advance for your comments. I'll see what I can address now:

> *2)* Not being able to safely lock a model/queryset beyond the lifetime of the request. 
I don't quite understand this. It sounds like pessimistic locking?
Correct. To be clear, I am not advocating this behaviour by default, but making it as seamless as possible to enable when required, rather than needing to attempt to hand-roll safe locking semantics each time it's needed.
 
> *3)* Not knowing that data has changed on the server until you submit a form.
> *4)* Smarter form validation

Well, this comes at the cost of the complexity to implement this. For me, at least at this time, the cost is much to high to consider this.
Certainly there is increased complexity. For the websites I am involved in (primarily health-related ones), if I don't end up providing a django-based solution, product owners end up demanding a SPA-based solution or similar, with the even-greater complexity to the development stack, not to mention testing. 

 
> *The solutions*
>
> *Enhanced forms*
> - when rendering a form (using e.g. as_p()), alongside the normal INPUT DOM elements, include additional hidden fields which store a copy of each form field's initial value.

I don't think that having hidden fields with the initial values is necessary: In your view, you can initialize the form like this:

if request.method == 'POST':
form = SomeForm(request.POST, initial=init)
...

Note the `initial` parameter: It is used just as it is in the GET request. This allows you to use `form.changed_data` and `form.has_changed()` in form validation.
But where does "init" come from? How can you know what version of the model instance was shown to the user when the form was rendered? There are only two ways I can see of to achieve this: use a full-blown "rowversion" CAS pattern, where there is a dedicated monotonic column on each table which automatically increases with each update, or the method I propose/use, where the original form values are provided via the user agent when the form is POSTed. I guess a third option would be to cache the form values server-side using redis each time a form a served, and provide an ID to it, perhaps even using the CSRF token as the key.

Perhaps I am missing something - if there is a way to retrieve the initial value of the form automatically, I would love to use it.

Regarding formsets, the same applies, I agree it needs to support that, and by embedding the original values into each form in the formset, it should correctly respect those values when the formset is submitted. 

If it's of any assistance, I can push some code and provide some examples for you to try out. The same app I am using now should be quite easy to clone and evaluate. 

Best regards,
Carsten
 
Gute Rutsch.

Nick 

Carsten Fuchs

unread,
Dec 31, 2021, 7:08:41 AM12/31/21
to django...@googlegroups.com
Hello,

Am 31.12.21 um 11:31 schrieb Nick Farrell:
> Correct. To be clear, I am not advocating this behaviour by default, but making it as seamless as possible to enable when required, rather than needing to attempt to hand-roll safe locking semantics each time it's needed.

Thanks for the clarification! It just confused me a bit because this approach seems to be complementary (or even barely related) to the other aspects.

> Certainly there is increased complexity. For the websites I am involved in (primarily health-related ones), if I don't end up providing a django-based solution, product owners end up demanding a SPA-based solution or similar, with the even-greater complexity to the development stack, not to mention testing.

Ahh, a very good point! :-)
But still I wonder if client-side validation and feedback should be optional? That is, if I had to do this myself, I'd hope to find a solution first that works 100 % without client-side effects. This would also help a lot with correctness and testing. Then I'd put all eye candy on top, but keeping it strictly optional.

> if request.method == 'POST':
> form = SomeForm(request.POST, initial=init)
> ...
>
> Note the `initial` parameter: It is used just as it is in the GET request. This allows you to use `form.changed_data` and `form.has_changed()` in form validation.
>
> But where does "init" come from? How can you know what version of the model instance was shown to the user when the form was rendered?

Hmmm. Yes, I see your point...

> There are only two ways I can see of to achieve this: use a full-blown "rowversion" CAS pattern, where there is a dedicated monotonic column on each table which automatically increases with each update, or the method I propose/use, where the original form values are provided via the user agent when the form is POSTed.

Then this would have to be temper-proof, wouldn't it?
(e.g. using https://itsdangerous.palletsprojects.com )

It might even be possible to serialize the entire state of the object into a single hidden field and sign it on GET, then check the signature and deserialize on POST. Or maybe, depending on the exact requirements, even the checksum of the old state would be enough in order to detect that something changed between the old version of the model (as it was when the user started editing it) and the current version (at the time the POST request arrives). This would roughly correspond to a version number without requiring an explicit field on the model.

> Gute Rutsch.

Danke gleichfalls! :-)
Thanks, the same to you!

Best regards,
Carsten

Nick Farrell

unread,
Jan 8, 2022, 7:54:23 PM1/8/22
to Django users
I thought I'd post a little update, as I'm fairly happy with my progress:

Here's the repo's readme. I haven't actually pushed the package to pypi so don't try to follow the instructions yet, but any feedback on the README's content is very welcome: https://github.com/nicois/nango/blob/develop/README.md . Hopefully in the next couple of days I'll push the package to pypi so anyone who's interested can make sure it works as advertised.

But still I wonder if client-side validation and feedback should be optional? That is, if I had to do this myself, I'd hope to find a solution first that works 100 % without client-side effects. This would also help a lot with correctness and testing. Then I'd put all eye candy on top, but keeping it strictly optional.

The client-side validation is latent in this release. That is, there is some code which will provide client-side validation with websockets, but it is disabled unless explicitly enabled in settings.py. I've put in the beginnings of some automated tests, which I can expand as I proceed. I agree that the server-side data integrity is more important, lower risk and easier to test.
 
> There are only two ways I can see of to achieve this: use a full-blown "rowversion" CAS pattern, where there is a dedicated monotonic column on each table which automatically increases with each update, or the method I propose/use, where the original form values are provided via the user agent when the form is POSTed.

Then this would have to be temper-proof, wouldn't it?
(e.g. using https://itsdangerous.palletsprojects.com )
No, the (current) intention is not to make this tamper-proof. If you think about it, there is no value from trying to protect against an authenticated and authorised user who wants to submit incorrect values. While there could be HMACs etc, I don't see any value at all, as a malicious user does not need to tamper with the original values to submit bad data.
 
It might even be possible to serialize the entire state of the object into a single hidden field and sign it on GET, then check the signature and deserialize on POST. Or maybe, depending on the exact requirements, even the checksum of the old state would be enough in order to detect that something changed between the old version of the model (as it was when the user started editing it) and the current version (at the time the POST request arrives). This would roughly correspond to a version number without requiring an explicit field on the model.
Remember also that there is little to no value in checking that fields on the model have changed, if those fields are not shown on the model: firstly because the form will not update those fields in the database, and secondly because if only those "invisible" fields change, the end-user would see the same thing on the form, and would not alter their behaviour. 

The value here is in showing a user that one of the fields they are in the process of editing has changed while they have had the form open, and ensuring that they do not accidentally clobber someone else's changes.

Thanks for your feedback so far. 
 
Nick
Reply all
Reply to author
Forward
0 new messages