Support for MySQL `INSERT ... ON DUPLICATE KEY UPDATE ...`. proposed and code example

370 views
Skip to first unread message

Matteius Davis

unread,
May 28, 2014, 8:50:23 PM5/28/14
to django-d...@googlegroups.com

Greetings Django Developers,

I am opening this thread based on my research and existing coding toward a solution of this issue, whereby Django today does not support effective usage of the MySQL specific construct `INSERT ... ON DUPLICATE KEY UPDATE ...`.

First some background:  last month I solved a master-master MySQL replication problem we were having at my place of work whereby certain tables had records that were colliding during fail over events.  The issue originated from our optimistic collection of data, except that in fail over with master-master, we found it was possible to collect the same data in both centers and then the UNIQUE constrain on the keys would blow up.

I solved the problem, by opening a mysql.py and writing two methods, which utilize the Django internals, to essentially extend the INSERT construct, and to generate what is essentially a valid Upsert construct.   It utilizes the MySQL functionality outlined here:  http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

I now have the support of the development team to work towards a Django patch that which would adapt this functionality in the most transparent way possible to the ORM, in order to add further support for this MySQL specific feature.

There are two calling patterns which are important here:

  1. The update idempotent field pattern, where we simply pass in the name of the primary key such that we assign the key to be equal to the key – essentially, if the record does exist, don’t truly update anything about it.

  2. The update all fields pattern, where by if the original INSERT clause failed due to duplicate key it will intend to go ahead with updating all fields, as if the original INSERT had worked out of the box.

The difference between the two, and why this is important, is because if we are creating a dummy record at some point (such as a FK constraint but prior to knowing how we want to fill out the record) then we don’t want this INSERT to collide and take out the creation of the real data within the record.  As a result we have two ways to call this.

I hope this is about to become more clear as I share the code sample which I wish to adopt into a patch, such that I might fulfill my dream of becoming a Django contributor.

I now have permission to adapt this code to be a suitable patch for Django to adopt support of this MySQL specific feature.  So I am sharing it now here in its 37 lines of glory.

https://gist.github.com/matteius/fff39563d1d8ddfc7168

 

I am looking for general feedback as well as some specific feedback:

  1. Right now how I actually execute this custom logic is slightly ugly: I check if the settings DATABASE engine is MySql, and we call the specialized logic, otherwise I use the regular model.save() because it means the code is being run from unit tests on a sqlitedb.

  2. To me it seems somewhat similar to update_or_create, but is still MySQL specific and has the calling patterns mentioned above.  Is there a way this might be a well suited MySQL specific override to a pattern such as `update_or_create` or is there another pattern more suited?

  3. When we look at a database specific feature such as a MySQL statement such as this, what kind advice or guidance can be given around unit testing a database specific feature?  My only though about this so far is it might be possible to test the semantic construct being generated, and to even mock out the sql execution in the test.  – this brings me to a sub-question I’d like to know more about Django …

    3.b.) What DB powers the Django unit tests (sqllite?) and Does Django ever mock out function calls in the test—I often use the mock library but I doubt that is being used here.

       4.) I hope this is well received, I think supporting this DB specific features that helps solve problems that arise out of more complicated configurations is important to Django maturing as a product. 

I had found a couple older items out there referencing the lack of support, but this solution to the problem is technically my own and I think could be well adapted for Django.  Since this is a MySQL specific feature, it is not totally clear to me how this best fits within the scope of the ORM.

I wrote here first rather than opening a ticket, because I read the guidelines.  Despite attending the DjangoCon 2013 sprints, I am still not technically a contributor to Django—this is my best shot yet.

Russell Keith-Magee

unread,
May 28, 2014, 10:16:51 PM5/28/14
to Django Developers

Hi Matteius,

On Thu, May 29, 2014 at 8:49 AM, Matteius Davis <matt...@gmail.com> wrote:

Greetings Django Developers,

I am opening this thread based on my research and existing coding toward a solution of this issue, whereby Django today does not support effective usage of the MySQL specific construct `INSERT ... ON DUPLICATE KEY UPDATE ...`.

First some background:  last month I solved a master-master MySQL replication problem we were having at my place of work whereby certain tables had records that were colliding during fail over events.  The issue originated from our optimistic collection of data, except that in fail over with master-master, we found it was possible to collect the same data in both centers and then the UNIQUE constrain on the keys would blow up.

I solved the problem, by opening a mysql.py and writing two methods, which utilize the Django internals, to essentially extend the INSERT construct, and to generate what is essentially a valid Upsert construct.   It utilizes the MySQL functionality outlined here:  http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

 

I now have the support of the development team to work towards a Django patch that which would adapt this functionality in the most transparent way possible to the ORM, in order to add further support for this MySQL specific feature.

There are two calling patterns which are important here:

  1. The update idempotent field pattern, where we simply pass in the name of the primary key such that we assign the key to be equal to the key – essentially, if the record does exist, don’t truly update anything about it.

  2. The update all fields pattern, where by if the original INSERT clause failed due to duplicate key it will intend to go ahead with updating all fields, as if the original INSERT had worked out of the box.

The difference between the two, and why this is important, is because if we are creating a dummy record at some point (such as a FK constraint but prior to knowing how we want to fill out the record) then we don’t want this INSERT to collide and take out the creation of the real data within the record.  As a result we have two ways to call this.

I hope this is about to become more clear as I share the code sample which I wish to adopt into a patch, such that I might fulfill my dream of becoming a Django contributor.

I certainly hope we can realise this dream :-)
 

I now have permission to adapt this code to be a suitable patch for Django to adopt support of this MySQL specific feature.  So I am sharing it now here in its 37 lines of glory.

https://gist.github.com/matteius/fff39563d1d8ddfc7168

 

I am looking for general feedback as well as some specific feedback:

  1. Right now how I actually execute this custom logic is slightly ugly: I check if the settings DATABASE engine is MySql, and we call the specialized logic, otherwise I use the regular model.save() because it means the code is being run from unit tests on a sqlitedb.

Roughly speaking, that's the right approach - however, you shouldn't do it as an "if MySQL" check in the main ORM code. What you should be doing is either:

 * Abstracting the block of functionality for which MySQL has a native implementation, putting the current code into the "base" database backend, and providing a MySQL-specific override in the MySQL backend, or

* Don't check for a backend - check for a database feature. You'll notice that Database backends all define a list of supported features -- for example, "interprets_empty_strings_as_nulls". This means that internal logic of the ORM is determined by specific features that are (or are not) supported by backends, rather than an "Is this MySQL" check. This is analogous to best practice in front-end design - you don't check for "Is this IE6?", you check for "Does the DOM support feature X?".

A combination of these two may be necessary - for example, a feature to identify that the alternate logic is possible, and a backend call to construct the specific command that implements that feature.
  1. To me it seems somewhat similar to update_or_create, but is still MySQL specific and has the calling patterns mentioned above.  Is there a way this might be a well suited MySQL specific override to a pattern such as `update_or_create` or is there another pattern more suited?

We need to be careful about terminology here. Are we talking about a DB specific feature, or a general concept that has an implementation specific rendition?

Historically, Django doesn't provide much support for DB specific features - we've tended to keep the ORM a "lingua franca" between databases. 

However, that doesn't mean that every database has the same implementation. Even basic things like quoting are completely different between platforms, and Django abstracts around those differences to provide a consistent API.

What we try to capture is the 'bigger picture' - what is the feature that you're actually trying to accomplish, and how would you accomplish that on each backend, using the best available features of each backend?
  1. When we look at a database specific feature such as a MySQL statement such as this, what kind advice or guidance can be given around unit testing a database specific feature?  My only though about this so far is it might be possible to test the semantic construct being generated, and to even mock out the sql execution in the test.  – this brings me to a sub-question I’d like to know more about Django …

    3.b.) What DB powers the Django unit tests (sqllite?) and Does Django ever mock out function calls in the test—I often use the mock library but I doubt that is being used here.

Django's tests run on *all* the supported databases. We run the full suite once for each database, on each supported Python version. We're also trying to expand on to different operating systems. If you check our CI system (ci.djangoproject.com), you'll see the test suite is executed 20 times for each "build", and we do these 20 builds for each version we support.

If you've genuinely got a database specific feature, then you wrap the test in a skipif() (or, if you're dealing with a database feature, skipIfDBFeature). You can find lots of examples of these in Django's existing test suite; they're all there to make sure that each backend behaves the right way. However, a test skip should be a last resort, and only used if a specific feature *cannot* be executed on a specific database. Where possible, the test should run on all backends, just with a different underlying implementation.
 

       4.) I hope this is well received, I think supporting this DB specific features that helps solve problems that arise out of more complicated configurations is important to Django maturing as a product. 

I had found a couple older items out there referencing the lack of support, but this solution to the problem is technically my own and I think could be well adapted for Django.  Since this is a MySQL specific feature, it is not totally clear to me how this best fits within the scope of the ORM.

As I said before - supporting a specific MySQL construct isn't in the scope of the ORM, but providing support for a broader, abstracted idea that has a specific rendition in MySQL *is* in the scope of the ORM. The catch is finding the right abstraction. 

The concept of UPSERT may well be the right abstraction here; Oracle supports upsert (called MERGE), and it's on the PostgreSQL roadmap too (although it won't be in 9.4). However as Django's current code shows, it's possible to fake a poor-developer's upsert using INSERT and UPDATE, so we already have a fallback.

I wrote here first rather than opening a ticket, because I read the guidelines.  Despite attending the DjangoCon 2013 sprints, I am still not technically a contributor to Django—this is my best shot yet.

It's certainly a great start :-)

Yours,
Russ Magee %-)
Reply all
Reply to author
Forward
0 new messages