Validation of m2m

617 views
Skip to first unread message

Federico Capoano

unread,
Dec 3, 2015, 7:28:29 AM12/3/15
to Django developers (Contributions to Django itself)
Hi everybody,

I am sure it has happened to many of you.

Validating m2m BEFORE saving the relationships is very hard and time consuming.

Now this solution:

Proposes to solve it with a ModelForm in the admin.
Cool, that works.

But, if I want to enforce validation on the model, to avoid corrupted data, I notice the signal is executed in a transaction block, in which if I raise a ValidationError I get the following:

Traceback (most recent call last):
  File "/var/www/django-netjsonconfig/django_netjsonconfig/tests/test_device.py", line 106, in test_m2m_validation
    d.templates.add(t)
  File "/home/nemesis/.virtualenvs/djnetconfig3/lib/python3.4/site-packages/django/db/models/fields/related_descriptors.py", line 843, in add
    self._add_items(self.source_field_name, self.target_field_name, *objs)
  File "/home/nemesis/.virtualenvs/djnetconfig3/lib/python3.4/site-packages/sortedm2m/fields.py", line 138, in _add_items
    for val in vals:
  File "/home/nemesis/.virtualenvs/djnetconfig3/lib/python3.4/site-packages/django/db/models/query.py", line 258, in __iter__
    self._fetch_all()
  File "/home/nemesis/.virtualenvs/djnetconfig3/lib/python3.4/site-packages/django/db/models/query.py", line 1074, in _fetch_all
    self._result_cache = list(self.iterator())
  File "/home/nemesis/.virtualenvs/djnetconfig3/lib/python3.4/site-packages/django/db/models/query.py", line 158, in __iter__
    for row in compiler.results_iter():
  File "/home/nemesis/.virtualenvs/djnetconfig3/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 806, in results_iter
    results = self.execute_sql(MULTI)
  File "/home/nemesis/.virtualenvs/djnetconfig3/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 852, in execute_sql
    cursor.execute(sql, params)
  File "/home/nemesis/.virtualenvs/djnetconfig3/lib/python3.4/site-packages/django/db/backends/utils.py", line 59, in execute
    self.db.validate_no_broken_transaction()
  File "/home/nemesis/.virtualenvs/djnetconfig3/lib/python3.4/site-packages/django/db/backends/base/base.py", line 429, in validate_no_broken_transaction
    "An error occurred in the current transaction. You can't "
django.db.transaction.TransactionManagementError: An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block.

This is surely an area that needs improvement in django.

Why is it so hard?

Best regards
Federico

Aymeric Augustin

unread,
Dec 3, 2015, 7:43:21 AM12/3/15
to django-d...@googlegroups.com
Hello Frederico,

It appears that you're hitting the problem described in the "Avoid catching exceptions inside atomic!" warning on this page:
https://docs.djangoproject.com/en/1.8/topics/db/transactions/#handling-exceptions-within-postgresql-transactions

To obtain that sort of result, I suppose you must be catching an IntegrityError, re-raising a ValidationError, which Django in turn catches, and then you hit that traceback.

Adding an atomic block inside your try/catch block that catches the IntegrityError will resolve that particular problem — putting that part of the discussion into django-users territory.

If this isn't what happens, please post your code and ask for help on django-users.

-- 
Aymeric

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/2e6e82d0-0645-4fd7-8905-d327c99b6352%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Aymeric.

Federico Capoano

unread,
Dec 3, 2015, 11:04:22 AM12/3/15
to Django developers (Contributions to Django itself)
Thanks Aymeric,

I decided to post this problem on django-developers because I've read this ticket:
Sorry for omitting this information.

Has there been a discussion about this topic already?

Would it be hard to implement an easier solution into django?

I spent a few hours working on this issue, I consider myself fluent with django and it's quite shocking I had to put such an amount of effort just to validate many2many relationships before they are saved to the database.

IMHO it would be better if we could do one of these two (or even both) things:

1. make this process easier in future django versions
2. document the current best practice to solve this problem in current django to save people's time

What do people you think?


What do you think of it? Can it be improved in some way?

Federico

Tim Graham

unread,
Dec 3, 2015, 11:21:06 AM12/3/15
to Django developers (Contributions to Django itself)
Here's an open ticket about adding model level validation for many-to-many: https://code.djangoproject.com/ticket/12938

Federico Capoano

unread,
Dec 4, 2015, 12:59:05 PM12/4/15
to Django developers (Contributions to Django itself)
It could be a potential ticket to work on my next django dev sprint.

But first it would be nice to have some basic consensus on how to proceed.

Was it ever discussed in any older thread or ticket?

Bernd Wechner

unread,
Feb 15, 2019, 6:44:22 AM2/15/19
to Django developers (Contributions to Django itself)
I'm curious what consensus looks like. In what forum among which stakeholders. Clearly among developers who have some knowledge of Djangos innards, and so I suspect here. But I find conversation here on this thread so neatly short I can digest it in a short read, and 

https://code.djangoproject.com/ticket/24731
https://code.djangoproject.com/ticket/12938

both touch on the same issue with similarly delectably short comment streams that I could read and digest them in a jiffy. Read another way, not a lot of consensus generation discussion or activity visible.

So I'll play the newb (because I am I guess, well been coding with Django for almost 4 years very very casually part time, and have had this particular issue on my "to solve" list for a long time already. But for me to make progress on it I need to invest some time into learning and some of that empirically through experimentation but I'll try and stimulate a little conversation here by doing some of it through the posing of a possibly naive question or two.

Before that let me state that in my application (and I suspect this is not unusual, I am trying to validate m2m relations and one2m as well in what I'd describe as a rich model object, I use the term rich to describe a handful of models related to one another, all of which are updated with one form submission. In my simplest example I am modelling a game session which has a one2many relationship to ranks, which have a one2one relationship to teams which have an m2m relationship to players. A single game session is logged as the ranked list of teams and their players ... but I wouldn't get lost in the one example, the point is such relationships can't be wildly uncommon and we would all like to keep the validation logic in the models. In my example a session would like to check that there are the right number of ranks, not too few, not too many, as the game allows. Validating this in the form is frustrating as it's a) somewhat more complicated and b) not as secure (allows erroneous saves through means other than this form). And the logic belongs in the model. The session knows how many ranks it's expecting. I've seen other examples just as clear.

And so onto the learning through possibly naive questioning.

As I see it there are two versions of this rich object (objects of several models all interrelated:

1) The database version 
2) the ORM version 

The problem I see is that the ORM versions lack primary keys and the relationships they create until they are saved. So aside from inviting correction of any misunderstandings I may have tabled I will ask the salient question: Is this not a classic application for transactions. Namely we save all of the objects without committing then do the validation of the relationships in the ORM. This seems naive to me as it presupposes a few things that may or may not be true (and I fear are not):

1) The an uncommitted transaction delivers primary keys
2) That an uncommitted transaction can easily be reflected back in the ORM

If these are possible, is it not a good chance then to validate the relations int he respective model's Clean() methods and through an exception on failure, that results in a complete roll back and if it succeeds results in a commit. 

I invite commentary, and discussion in the hope of achieving the elusive beast of consensus that Frederico alludes to.

The main need I see is to create the relationships in the ORM, and it may be possible to do this pre-save too, with place-holder PKs, I do exactly this sort of thing at the form level, managing the relations between the various form elements, with placeholder IDs on elements. And so I imagine loosely if the strategy above is impossible that another might implement something along these lines, a way to create all the objects and have them as ORM objects but with placeholder PKs where needed managing the relations and (a central difficult) a modelled through object for M2Ms - which I imagine can be modelled as a django model with a ForeignKey to each of the related models one such object capturing the relationshiips.

Where real PKs exist, they can be used, where they do not exist a placeholder could be used to model relationships in the ORM only. On save these placeholders are ignored and the database creates PKs as usual. 

I can imagine a number of ways of modelling such placeholders. Either simply encoding otherwise illegal values (-ve numbers), or reserving on ID to flag a placheholder (PK-0 or PK=Maxvalue) and the placeholder PK (PH) is another attribute on model with references to PK returning that if the reserved value is in place etc. Alternately just use legal values in the high range (max value down) for the PKs and use a state flag to indicate they are faux, and to be ignored on save (it could be the simple existing flag that tells us it's unsaved can serve that role).

Anyhow a pile of naive questions and speculations and I hope it stimulates a response or two or three and something that heads towards consensus.  Even if it's something completely different and all this is dismissed ;-).

Kind regards,

Bernd.

Josh Smeaton

unread,
Feb 16, 2019, 8:01:47 AM2/16/19
to Django developers (Contributions to Django itself)
Considering https://code.djangoproject.com/ticket/12938 is still marked as "open" and https://code.djangoproject.com/ticket/24731 is more or less a duplicate (is there a reason it isn't closed as a dupe specifically?), I don't think any more consensus is really required. The problem seems difficult without diving in and getting your hands dirty, which is probably why you're seeing such light discussion on this thread.

What I'd suggest is: put together a proposal and let discussion occur around that. Propose an API, and speak to how it won't impact existing users, and I think you'll find lots of people will have something to say.

FWIW, I'd be interested in seeing some kind of solution. Our ($work$) current response to feature requests that involve validation of m2m fields is "too hard for the effort".

Cheers

Bernd Wechner

unread,
Feb 16, 2019, 8:31:07 PM2/16/19
to Django developers (Contributions to Django itself)
Thanks Josh,

  1. A agree, they seem to be dupes and one should close, but not sure how that gets done, by whom (am a learner not a pro).
  2. I'm happy to take a stab at an API proposal but was throwing some naive questions out there int he hope of seeing some thought shared. In the mean time it prompted me to work on some experiments with the Tutorial models which I'm doing and it's proving interesting to say the least, but as noted by many, it's kind of costly as I spent an evening just learning one little thing about the difference between a CreateView and UpdateView (kicking myself on that but could not find, for the life of me any docs on it or alrady asked and answered enlightening questions on the stackoverflow front or elsewhere. And to write an API worth critiquing I'd want a little more chance of it being a tenable one (i.e. understand the innards a little better, which is one reason I single stepped a lot of Django code last night to learn what I have, and am still not 100% clear on it all).
  3. What I did learn is that it's a red herring to think of this as an m2m issue! It seems universally filed as that but empirically I see that one2m relationships suffer exactly the same problem - which is that the submitted object as no PK until saved, and so the many related objects (be it one to many or many to many, applies to all the "to many" contexts) are not yet known to it during the model validation (clean) calls. Which leads me suspect the true way forward might be indeed to define a form of placeholder PK that is 100% backward compatibl, so looks like PK=None to the rest of Django, excepting for a select view on those objects which respects the placeholder PKs and so makes "to many" relations visible and available for validation at the model not form level (which is what everyone seems to agree is what we want - because it's a) easier, in the ORM not in the form processing code, and b) more robust as it catches submissions from any form or avenue that have to pass model validation.
I totally get your work's response that m2m validation is "too hard for the effort" as I'm finding it that way (have an alpha site running without it for said reason), but seems to me numerous voices agree that for this very reason it's worth there being a Django internal solution as many of us benefit from on and can validate these relations at model level (where it's easier and better).

Cheers,

Bernd.

Bernd Wechner

unread,
Feb 20, 2019, 4:59:47 AM2/20/19
to Django developers (Contributions to Django itself)
Ouch, I am not believing my eyes, but somewhat overjoyed with what I have found in my explorations over recent evenings empirically exploring transactions. The TLDR is this: I am puzzled that I haven't tested and found this to date, and that all the folk I read online asking for m2m ORM validation haven't, So puzzled I had to do a second take, a thorough retake and lots of note taking on the whole process. But in a nutshell it IS possible to do ORM validation of 2many relations - at least with Django 2.1 and Postgresql. My findings are empirical and a dive down into the code again to back it with understanding is on my todo list. But I want to share the findings so far.

Essentially if in your form you override the post() method and use a transaction and add an explicit clean inside of that transaction then the objects clean() is called twice, once via form.is_valid() and the second time explicitly in our post() method. And stunningly on a CreateForm, the first time, with no id and no visible 2many relation data, but on the second all with an id and with all 2many relation data visible and there is at this point no change to the database yet!

In code form finding is clear and tested twice with thorough notes taken and here it is using a Book model straight out of the tutorials with the relations I want tested added:

class Book(models.Model):
    title
= models.CharField(max_length=100)
    authors
= models.ManyToManyField(Author, related_name="books")
    lead_author
= models.ForeignKey(Author, on_delete=models.CASCADE)
   
   
def clean(self):
       
if (self.id is None):
           
# Validate all the model fields that are not One2Many or Many2Many
           
pass
       
else:
           
# Validate all the One2Many or Many2Many fields
           
pass


class Chapter(models.Model):
    title
= models.CharField(max_length=100)
    book
= models.ForeignKey(Book, on_delete=models.CASCADE, related_name="chapters")
    author
= models.ForeignKey(Author, on_delete=models.CASCADE, related_name="chapters")


class BookCreate(CreateView):
    model
= Book
    fields
= '__all__'
   
   
ChapterFormSet = inlineformset_factory(Book, Chapter, fields="__all__")

   
def post(self, request, *args, **kwargs):
       
self.form = self.get_form()
       
       
if self.form.is_valid():
           
try:
               
with transaction.atomic():
                   
self.object = self.form.save(commit=True)
                                       
                   
self.formset = self.ChapterFormSet(request.POST, request.FILES, instance=self.object)
                   
self.formset.is_valid():

                       
self.formset.save()

                       
                   
self.object.clean()    
           
except IntegrityError:
                transaction
.set_rollback(True)
               
return self.form_invalid(self.form)
                                     
           
return self.form_valid(self.form)
       
else:
           
return self.form_invalid(self.form)

I have yet to nut out the nitty gritty of exceptions I throw in clean() and returning a form with error messages, but wanted to report this finding now to see who can shoot it down ;-).

I have not tested form errors with it yet, and I want to dive into the save() code to see how and why it's building a complete ORM with no sign of it in the database yet. It's GREAT that it does.

I have noticed that if the transaction on a Create is rolled back that in Postgresql at least this consumes the id that was provisioned to the ORM in the transaction. By which I mean the next Book created has the next id up, and the one in the baile dtransaction is gap in the id sequence of the table in the database. An interesting observation and of consequence only if ids are in short supply, or for some reason one wishes them to be consecutive (which is hard anyhow if you can delete records).

Still, I admit I am blown away by this discovery, and there it is. We don't, it seems need to do anything to clean our 2many relations in the model itself, only override post() and use a transaction (which is sensible anyhow).

Comments welcome.

Regards,

Bernd.


Bernd Wechner

unread,
Feb 21, 2019, 6:03:53 AM2/21/19
to Django developers (Contributions to Django itself)
Most once sided discussion I've seen on a developers group in a while ;-), but I'll go so far as to suggest an API for the Django Core on this one. I mean I have a way to advance myself now, but methinks this need is so common that perhaps it's best sitting in the Django Core, albeit with a setting as it demands an atomic transaction for create and update views. Now this rests on a few premises, namely:

  1. That there is nothing overly broken with my suggestion that I am not seeing as yet (I am the noob after all, and while pretty Python and Django savvy, not diving into the core code very often at all, as little as I've needed to, the very benefit of a cool framewor, keeping us isolated from the dirty details ;-).
  2. That the observations I tabled are valid across all three database backends that Django supports. if it's only  1 (Postgres) or 2 then it shoudl definitely stay an enable-able option and not a default behaviour.
A suggested API is as follows:

  1. Implement a  setting to enable this new behaviour. Looking over exiting settings they can get quite long in name (like  DATA_UPLOAD_MAX_NUMBER_FIELDS for example) and so perhaps a name like: RELATIONSHIP_VALIDATORS. I take inspiration from this setting:

    AUTH_PASSWORD_VALIDATORS

    Default: [] (Empty list)

    The list of validators that are used to check the strength of user’s passwords. See Password validation for more details. By default, no validation is performed and all passwords are accepted.

    And propose:

    TOMANY_RELATIONSHIP_VALIDATORS

    Default: False

    A boolean that specifies whether to use atomic transactions on in CreateView and UpdateView POST requests in order to support relationship (OneToMany and ManyToMany) validation in the Model itself. If true, Model.clean() is called as usual, then inside an atomic transaction all submitted forms and formsets are saved and Model.clean_relations() is called. It is important to note that when Model.clean() is saved ToMany relationship (creation or changes) are not visible to Model.clean() but are visible in Model.clean_relations() where they can be validated.

  2. Implement the code in CreateView.post(...) and UpdateView.post(...) that does this, that is a little savvier than my example in previous mail, but if form.is_valid() pass opens a transaction, saves the form and then for each other form or formset in the POSTed data (not sure off hand how to find those, more learning for me), check each with their .is_valid() and if passing save it, then call Model.clean_relations() (for each model involved (associated with any of the submitted forms or formsets). Of course has to work if the model fails to implement clean_relations(), either by checking it's there first of defining a default implementation which pay just pass.

  3. If Model.clean_relations() fails (throws an exception), roll back, if not commit. 

It's a rough outline that I think can work. And if it's good (which I await feedback on from the silent developer community) then there may even be a case for defaulting this setting to True, on new projects anyhow as it may be sensible to default to False on any existing site (not introduce a new transaction layer without warning on existing projects that upgrade Django - where there may well be a transaction system in place already).

I mean if it's a good plan, and it actually gets some feedback to that efffect or that help it evolve into a good plan, thus far this looks eminently like something I could implement and submit as a PR (Pull Request), but I'm just as likely to code mine up and run as I am way way busy on other stuff ;-).

Regards,

Bernd.

Ian Foote

unread,
Feb 21, 2019, 5:10:05 PM2/21/19
to django-d...@googlegroups.com
I don't think a new setting is the way to go. I'd prefer to add an attribute (validate_m2m = False?) to the CreateView and UpdateView classes that allows a developer to opt-in to the new behaviour when they need it. This is more flexible and still maintains backwards compatibility.

Regards,
Ian

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.

Bernd Wechner

unread,
Feb 21, 2019, 8:05:26 PM2/21/19
to Django developers (Contributions to Django itself)
That's an idea I like. Though it affects OneToMany relations too  (no idea why this is so often presented as an m2m issue, when it's literally a 2m issue. So I'd prefer the attribute to be something like validate_tomany_relations or validate_2m perhaps. The focus on m2m is misleading.

Either way yes, the attribute could default to False in the core CreateView and UpdateView conserving existing behaviour and be set to True in derived views to win a callback to Model.clean_rleations() (or a method with some other name, the name is up for discussion). Could even be that clean() accepts an optional arg like relationships_available or 2nd_pass or whatever ... and we can code clean() in our Models to have two paths, one on the first pass for the model fields (which can force a bail before we bother to start a transaction that needs rolling back) and a second pass after the forms and formsets are saved inside a transaction and all the relationships are visible.

Of course I'm still not sure if this works on MySQL or Lightdb? MY empircal testing is entirely on Postgresql for now.

But yes I agree it's more flexible as it can be set for individual model views or not as desired rather than for the whole app! Thumbs up really for that.

Chris Foresman

unread,
Feb 21, 2019, 10:55:24 PM2/21/19
to Django developers (Contributions to Django itself)
My guess is this does not work on MySQL or SQLite, since as far as I can tell it’s the only DB that will return IDs when bulk inserting. I’d think you’d have to have some other code path to handle those DB backends.

Bernd Wechner

unread,
Feb 21, 2019, 11:14:19 PM2/21/19
to Django developers (Contributions to Django itself)
That would be my guess too. So if it went into the Django core at all, it would need to be documented as a Postgresql only thing I guess (if the guess proves right). This might also then figure in the recommendation on databases. Postgresql is already recommended as the production database by Django doc somewhere, it's only reason I opted for it when I started this project, on that recommendation. And if, among its benefits is that you validate ToMany relationships in the ORM, that would be a feather in its cap. Was certainly a pleaasant and surprising discovery that I could get away with this in a transaction.
Reply all
Reply to author
Forward
0 new messages