Interesting approach -- I wouldn't have thought of this! It feels a
bit nasty to me, but it's rather close to how deferred constraints
work behind the scenes anyway. I'd like to see some feedback from
other people who're experienced with MySQL; I don't know enough about
any potential downsides to spot 'em. But I think you've found a way to
cut the knot of this problem, and barring a better solution I'd like
to check this one in.
> Anyhow, the patch should be 100% there so I'd love if someone could
> check it out and either push it along or let me know if any changes
> are required. It should be easy enough for me to address any issues
> while the whole problem is in my head.
I left some comments on the patch on the ticket.
> I'm using django-threadedcomments on a project, which has forward
> references in one of its test fixtures, so I'm reminded of this issue
> every time I run my project tests. I hate test errors! I'm hopeful the
> latest patch is sufficient to get this issue resolved.
Does this patch work for you (I'm assuming so)? A "this works for me
in production" goes a huge way towards me feeling comfortable checking
things in.
Jacob
I left some comments on the patch on the ticket.
> I'm using django-threadedcomments on a project, which has forward
> references in one of its test fixtures, so I'm reminded of this issue
> every time I run my project tests. I hate test errors! I'm hopeful the
> latest patch is sufficient to get this issue resolved.Does this patch work for you (I'm assuming so)? A "this works for me
in production" goes a huge way towards me feeling comfortable checking
things in.
Yeah, sorry -- I wasn't precise enough with my language. I meant
"works with a real-world project" -- since I don't use MySQL, I can't
test this out at all. So I need to rely on other people to tell me
"yeah, works fine."
> It'd be great if some others could test this patch out as well. In
> particular, I can't be sure how well ti works on older versions of MySQL.
> There's no crazy magic at work here but, well, you never know.
Mmhhm, that's a big one. This *is* MySQL we're talking about... :)
> If it works for others and we're all in agreement with the philosophical
> approach of the solution, I think it should be a fairly uncontroversial
> commit, since it's really primarily a fixture loading issue at its core.
If I get a couple-three reports of "worksforme" and if we can pin down
that this works with all the well-supported MySQL versions -- 4.1+,
essentially -- that's enough for me.
Jacob
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Unfortunately, not much. Your test has validated that the extra code
doesn't break anything under MyISAM, and this is certainly useful.
However, the root problem only exists with InnoDB because of its...
eclectic... implementation of row level constraints. MyISAM doesn't
have constraints, so your test hasn't exercised the part of the code
that needs to be exercised heavily.
Yours,
Russ Magee %-)
> Unfortunately, not much. Your test has validated that the extra code
> doesn't break anything under MyISAM, and this is certainly useful.
> However, the root problem only exists with InnoDB because of its...
> eclectic... implementation of row level constraints. MyISAM doesn't
> have constraints, so your test hasn't exercised the part of the code
> that needs to be exercised heavily.
One unintended benefit of the current patch is that it actually will (if I'm not mistaken) check for invalid foreign key references in MyISAM as well, since the check is run for all MySQL implementations (though not for any other backends). So MyISAM would normally silently allow bad foreign keys to be loaded from your fixture data, but with the patch you'll now get an error.
> I see a variable saved_objects being written, but I don't see it being accessed -- is this to ease future features, or am I missing a code path?
Thanks good catch. This was a remnant from an earlier iteration of this patch, in which I tried (unsuccessfully) to call save() on any objects created during the fixture load after foreign key checks had been re-enabled, in hopes that the UPDATE call this generated would trigger a constraint check. It did not, because MySQL ignores UPDATE if the data being set matches what's already in the DB. I have some other changes I'm making in response to some of Jacob's comments on the ticket, so this will be gone in the next version of the patch.
> If I'm reading correctly, check_for_invalid_foreign_keys extends over all the rows in a table. loaddata is called by syncdb and South's migrate, not just when a db is set up, so this could easily wind up run over lots and lots of non-fixture data. I don't know MySQL's performance characteristics that well -- is this likely to be expensive?
This is a good question.
First, though it's true that loaddata is called in several places other than during tests, I would imagine it is rarely if ever called during a normal request/response cycle in production. I'm operating under the assumption that the performance of this patch is important but not as critical as it would be if this was a command used in production.
So that caveat aside, I will try later to find a large data set I can run the query on to get an idea of what kind of performance hit the check entails. If anyone else has a large data set with two related tables, you can try it out as well and report your results. The basic structure of the SQL statement being run is as follows:
SELECT REFERRING.`id`, REFERRING.`id_pointing_to_some_related_table ` FROM `some_table` as REFERRING
LEFT JOIN `some_related_table` as REFERRED
ON (REFERRING.`id_pointing_to_some_related_table` = REFERRED.`id`)
WHERE REFERRING.`id_pointing_to_some_related_table ` IS NOT NULL
AND REFERRED.`id` IS NULL
And keep in mind this is being called for *each* relation in the table. So that'll run through all the rows once for every relation. Of course, it won't return anything unless you have bad data, so the i/o aspect should be minimal.
If we notice an "unacceptable" performance hit the likely solution would be to scope that select statement to only rows that were added during the load data routine. Ideally I'm hoping to steer clear of that, because it entails more work and complexity in loaddata, since you'd have to track all the IDs that were added and then pass them along here. Wouldn't be a big deal but that's why I only wanted to do it if necessary.
Thanks for your input and help testing this btw!
Jim
So that caveat aside, I will try later to find a large data set I can run the query on to get an idea of what kind of performance hit the check entails. If anyone else has a large data set with two related tables, you can try it out as well and report your results.
My thinking at this point would be the performance is "good enough" for the scope of the current ticket, and that if better performance were required or desired, that could be facilitated under a separate ticket, assuming there was community demand for it.
Also, though I don't have MySQL 4 handy to test on, I'd be astonished if there were any issue there compared to MySQL 5. The set foreign_key_check command is certainly supported in MySLQ 4 and the select being issued to do the check is nothing special at all.
I have not had time to try out the patch, but did look at it. Doesn't the base implementation of disable_foreign_key_checks need to return False instead of just passing? The return value is used in loaddata processing to decide whether it's necessary to re-enable/check.
Thanks for working on this -- wish I'd thought of that idea two years ago!
I have not had time to try out the patch, but did look at it. Doesn't the base implementation of disable_foreign_key_checks need to return False instead of just passing? The return value is used in loaddata processing to decide whether it's necessary to re-enable/check.
It actually doesn't *need* to return False; pass is the same as not returning anything or returning None. The boolean check just treats it the same way as False. "Should it?" is another question. On the one hand it's a bit more clear, this value is called and always returns False, unless a backend has overridden it. On the other hand, pass is in keeping with other methods in that class that are meant to be overridden in backends, so I went with pass to emphasize that aspect of the code.
> It actually doesn't *need* to return False; pass is the same as not returning anything or returning None. The boolean check just treats it the same way as False. "Should it?" is another question. On the one hand it's a bit more clear, this value is called and always returns False, unless a backend has overridden it. On the other hand, pass is in keeping with other methods in that class that are meant to be overridden in backends, so I went with pass to emphasize that aspect of the code.
>
> Hmm, well, I did not know that falling off the end of a method with pass guaranteed a return value of False or None (and can't find that noted in the doc here:http://docs.python.org/tutorial/controlflow.html#pass-statements) so in my mind explicitly returning False would be clearer/better...
Ah, okay. To be clear, the return value actually has nothing to do specifically with the use of pass or not. The behavior of Python is to return None from any function that does not explicitly return a value. This is from the exact same page in the Python docs you linked to:
"In fact, even functions without a return statement do return a value, albeit a rather boring one. This value is called None (it’s a built-in name). Writing the value None is normally suppressed by the interpreter if it would be the only value written."
Obviously it's a fairly minor point but we want things to be as clear as possible. Anyone else wants to weigh in on "pass" vs. an explicit return value?
> Sorry for the noobish question but, could someone explain the definition of "forward references"?? Is this a MySQL or a django term?? Google wasn't very forthcoming :X
Jacob actually requested that I add a note in the database docs on the topic this patch addresses. So as a way of answering your question I'll paste that note verbatim here:
In previous versions of Django, fixtures with forward references (i.e.
relations to rows that have not yet been inserted into the database) would fail
to load when using the InnoDB storage engine. This was due to the fact that InnoDB
deviates from the SQL standard by checking foreign key constraints immediately
instead of deferring the check until the transaction is committed. This
problem has been resolved in Django 1.4. Fixture data is now loaded with foreign key
checks turned off; foreign key checks are then re-enabled when the data has
finished loading, at which point the entire table is checked for invalid foreign
key references and an `IntegrityError` is raised if any are found.
Is that note illuminating to you? If not, then presumably it needs work :)
--
"In fact, even functions without a return statement do return a value, albeit a rather boring one. This value is called None (it’s a built-in name). Writing the value None is normally suppressed by the interpreter if it would be the only value written."
Disabling foreign key checks is exactly what mysqldump puts at the
start of the dump. Presumably your database will have no bad
references to start with, so the resulting dump shouldn't have any
either, and thus be safe to load with foreign key checks off.
http://dev.mysql.com/doc/refman/5.5/en/innodb-foreign-key-constraints.html
(for reference)
--
Question the answers
if no "return" statement is reached, the function will return None. See
"None" paragraph in [1]
[1]
http://docs.python.org/reference/datamodel.html#the-standard-type-hierarchy
Marco
Interested - most certainly. The issue (as always) is finding the time :-)
I'm certainly grateful for your efforts; this is a long standing issue
that I'd love to see addressed; I'm just not in a position to commit
to anything. I'll put this on my list of things to look at, but I'm
not getting a whole lot of time to look at that list at the moment.
> * There's a DB feature can_defer_constraint_checks . I couldn't find much
> by way of documentation or or usage of this feature. But I was trying to
> figure out if the work we are doing here is what this feature refers to, and
> if so, if we should be marking this as True for MySQL and SQLite with this
> implementation. I'm not sure there is other behavior that is required or
> expected in that attribute. Anyhow, that might also be a path forward for
> the issue I raise above (ie. we could skip the test if
> can_defer_constraint_checks is not True).
This feature flag exists essentially to verify whether MySQL InnoDB is
currently in use. It isn't used during normal runtime; it's purely a
test skipping flag. It's used to identify tests that need to be
skipped because the test data requires a circular reference which
(historically) hasn't been possible under InnoDB because constraints
aren't checked.
Essentially, this feature flag shouldn't need to exist at all; it only
exists so that InnoDB passes the test suite without errors. If we are
able to resolve the issue that allows MySQL to use forward references
in data, then this feature flag shouldn't be required at all.
Yours,
Russ Magee %-)
If you aren't using InnoDB, then it probably doesn't matter if you turn foreign key checks on or off: it becomes a no-op. There are some other storage engines that support transactions, and some of them might "do the right thing" with respect to deferred foreign key checks, but I think it does no harm to disable the checks on loading fixtures on these engines. In other words, always turn them off when loading fixtures (and back on afterwards) and don't care about the storage engine and it should be fine. Keep it simple.
This isn't true since the introduction of the new deletion code last
fall. The feature flag is used in db/models/deletion.py to detect
whether it is necessary to null out nullable FKs prior to deletion. On
backends which can defer constraints, this nulling-out would add a
needless extra query. On backends which can't defer, nulling-out means
the nullable FK relationship doesn't dictate a deletion-ordering
constraint, which lets us handle circular FK references as long as at
least one of them is nullable (otherwise we'd have to just throw up our
hands and hope for the best, likely resulting in an integrity error).
Since this use of the flag is unrelated to fixture loading, the
fixture-loading fix should not change the value of the flag for any backend.
Carl