DB Migrations: Proposed fix for "Index name collisions after objects are renamed then re-created"

572 views
Skip to first unread message

tomv

unread,
Nov 20, 2014, 4:56:46 AM11/20/14
to django-d...@googlegroups.com
Hi all,

At the Django under the hood sprint I picked up #23577 which says, in the general case: 
  1. For any database object when django creates an associated index
  2. If you rename that object
  3. Then re-create another with the original's name
  4. You get a collision in the name of the associated index
This applies in the simplest case to renaming any field with db_index=True, then re-creating another in it's place. But also renaming a model containing a ForeignKey or indexed field, then re-creating the model. (All via db migrations)

The instinctive desire is to rename indexes when renaming objects whose name was used in the index creation. But speaking to Andrew Godwin, he feels this would be quite a large challenge, across all our supported backends.

The alternative he suggested was to add a random element to all index names the schema editor creates. This post is seeking to validate that proposal.

Note: it's only the names of indexes at issue here, databases keep everything pointing at the right objects. There's also no issue with Django needing to access an index via a predictable name, as there's introspection which grabs indexes according to the objects they work on.

Bonus question: given the scenario above, can we come up with a source of index naming that differs after the "parent" object is renamed/re-created, but which remains deterministic? Something like pulling in a hash of the whole app model state. Unfortunately, as we aim to output migrations to sql, we can't have any introspection at migration application time, so for example no checking for name collisions.

Cheers,
Tom

Andrew Godwin

unread,
Nov 20, 2014, 1:08:15 PM11/20/14
to django-d...@googlegroups.com
Just chiming in on this; I sat down with Tom to look at this during the sprint, and while we could go to a large amount of effort to try and rename indexes on every rename of a model/field, the problem is solved much more easily by introducing a random element to index names.

This does, of course, make the databases that migrations creates have subtly different schemas; this doesn't matter to Django, as we always look up indexes based on characteristics rather than names, but if you have any examples of where predictable names do matter, please share them!

(also, I had an idea on the way back that perhaps we could derive the random element from the migration name, making it different across different migrations so they don't clash but still the same across runs, but I'm not sure how you'd access that from within SchemaEditor)

Andrew

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/99246d04-b205-42a6-a389-a0370bea3cd8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Carl Meyer

unread,
Nov 20, 2014, 1:14:06 PM11/20/14
to django-d...@googlegroups.com
On 11/20/2014 11:07 AM, Andrew Godwin wrote:
> (also, I had an idea on the way back that perhaps we could derive the
> random element from the migration name, making it different across
> different migrations so they don't clash but still the same across runs,
> but I'm not sure how you'd access that from within SchemaEditor)

Could the SchemaEditor method to create an index take an additional
optional argument (not sure what you'd call it) that's an arbitrary
string to include in the name? The appropriate Operation class(es) would
also have to take this argument, and then the generated migration files
could include it (hardcoded).

Carl

signature.asc

tomv

unread,
Nov 25, 2014, 6:13:37 PM11/25/14
to django-d...@googlegroups.com
So one option as you suggest Carl, is to pass a hard coded string into the Operation when it's instantiated in the user's migration file. I've taken a similar approach, starting one level lower, injecting the migration name into database_forwards methods. https://github.com/tomviner/django/compare/ticket_23577_with_poc_migration_name_fix

This then passing into schema_editor public method calls (create_model, add_field and alter_field, which all need their signatures changed over BaseDatabaseSchemaEditor and all the database specific schema editor backends). Then the migration name is passed through internal schema editor methods like _alter_field, _create_unique_sql, _create_index_sql, _create_fk_sql and finally on to _create_index_name.

See how the tests have to pass in extra_index_suffix='schema_test.000x' as they don't go through an actual migration file.

To sum up this solution: it's not localised and it's not pretty. I'm sure it could be tidied up a little, but unless it can be collapsed to a much smaller patch, I'd say having deterministic index names isn't worth the impact.


Instead, I propose the following solution: a random string injected into the formation of all index names. A much more localised patch: https://github.com/tomviner/django/compare/ticket_23577_with_poc_random_string_fix

So final call for examples where having predictable index names matter!

One thing I'm still looking at, regardless of the solution chosen, is whether we need to include spatial indexes too.

Tom

Shai Berger

unread,
Nov 27, 2014, 3:44:28 PM11/27/14
to django-d...@googlegroups.com
Hi,

Just noted this thread, and I have two points:

1) Using the migration name in the index isn't really "predictable", and isn't
even completely stable (the name changes when migrations are squashed), unless
I'm missing something.

2) The current practice of identifying indexes by the columns they use is
sufficient now, but may not suffice when we have finer control over indexes (Marc
Tamlyn is writing the DEP[1] for this). I can very easily see, for example,
uses for two partial indexes on the same field (on PG, where partial indexes
exist).

I'm not sure we need fully predictable index names now, but my hunch is that
we are going to need them down the road, and an explicitly-set suffix is a lot
closer to what we'll need than a random one.

My 2 cents,
Shai.


[1] https://github.com/django/deps/pull/6/files

Florian Apolloner

unread,
Nov 28, 2014, 2:51:31 AM11/28/14
to django-d...@googlegroups.com


On Thursday, November 27, 2014 9:44:28 PM UTC+1, Shai Berger wrote:
1) Using the migration name in the index isn't really "predictable", and isn't
even completely stable (the name changes when migrations are squashed), unless
I'm missing something.

I think squashing isn't an issue, the migration name would get hardcoded into the operation as argument and squashing would just have to preserve that.

Cheers,
Florian

tomv

unread,
Nov 30, 2014, 7:54:46 AM11/30/14
to django-d...@googlegroups.com
Thanks for pointing out the index improvement DEP Shai, I hadn't seen that yet.

It's important to note that right now, index names are not re-creatable in retrospect. They rely on the names of models and fields, that are free to be renamed. So a complete rethink will be needed if introspection can no longer be used for user-specified types of indexes. For example, maybe the user should choose the name, which they should make unique at the app level? Or if not, Django will need to either keep a record of the generated index name somewhere, or start renaming indexes to keep them up-to-date with the components in their names.

What's the best way to proceed with the index name collision ticket #23577 at this point then? I can:
  1. re-write my "use migration names in index names" branch to allow index suffix names passed from migration operations?
  2. or declare there's no consensus on the solution / we'll wait for Mark's index DEP - in which case, can I submit my tests as xfails?
Cheers,
Tom

Carl Meyer

unread,
Dec 1, 2014, 12:18:31 PM12/1/14
to django-d...@googlegroups.com
On 11/30/2014 05:54 AM, tomv wrote:
> It's important to note that right now, index names are not re-creatable in
> retrospect. They rely on the names of models and fields, that are free to
> be renamed. So a complete rethink will be needed if introspection can no
> longer be used for user-specified types of indexes. For example, maybe the
> user should choose the name, which they should make unique at the app
> level? Or if not, Django will need to either keep a record of the generated
> index name somewhere, or start renaming indexes to keep them up-to-date
> with the components in their names.

I think one way or another with the indexes DEP we will need to surface
the name of an index as an explicit part of its deconstructed
specification (just like with fields, models, etc). This implies
probably letting the user pick an explicit name if they want. We could
also keep the option of using an autogenerated default name, but in that
case migrations should notice if that autogenerated name changes and
generate an index-name-change operation.

> What's the best way to proceed with the index name collision ticket #23577
> <https://code.djangoproject.com/ticket/23577> at this point then? I can:
>
> 1. re-write my "use migration names in index names" branch
> <https://github.com/tomviner/django/compare/ticket_23577_with_poc_migration_name_fix>
> to allow index suffix names passed from migration operations?

What about having the entire index name explicitly passed from the
operation in migrations (even if for now its always autogenerated
there)? That seems like maybe a step towards where I think we'll
eventually need to be. (But take with a grain of salt, you've been in
this code and I haven't).

> 2. or declare there's no consensus on the solution / we'll wait for
> Mark's index DEP - in which case, can I submit my tests
> <https://github.com/tomviner/django/compare/ticket_23577_with_tests> as
> xfails?

Hmm, that's an interesting idea. It's not something we've generally done
in the past, but it might be better than just letting the tests bit-rot
as an un-merged PR, later probably needing to be rebased. I'm curious
what others think of the idea of actually merging xfail tests for
unfixed bugs?

Carl

signature.asc

Tim Graham

unread,
Dec 1, 2014, 12:35:17 PM12/1/14
to django-d...@googlegroups.com
I'm not in favor of merging expectedFailure tests:
* It adds the overhead of time running tests that we know aren't expected to work.
* It's more code to maintain (might go stale in the meantime anyway).
* An issue might be obsoleted (via deprecation, etc.) at some point.
* When viewing commit history, it seems clearer to add tests when fixing an issue rather than removing a decorator on a test (the contents of the test won't be easily viewable in the latter case).

We currently have < 10 tests marked as such in the test suite now. I'd like to audit them to check their state. Here's a ticket if anyone wants to help with that: https://code.djangoproject.com/ticket/23943

Markus Holtermann

unread,
Dec 1, 2014, 2:15:45 PM12/1/14
to django-d...@googlegroups.com
Hey folks,

I don't like the idea of expected failures either.

Given the fact that at one point user defined indexes are going to be introduced, I would consider to delay this issue for now until the DEP has been accepted and is (being) implemented. At that point we know what the underlying API in e.g. the schema editor is and how we are generating the indexes in migrations. We might end up with AddIndex, AlterIndex and DeleteIndex operations where CreateModel etc don't know about the index.

Introducing some - in any way hacky-ish - "solution" to prevent duplicate index names would probably be deprecated as soon as custom indexes are implemented. That said, touching the migration operations for 1.8 (LTS) in a way that we are not sure yet is are going to stay, and later introducing a new API, seems a bit odd and complicated in terms of backwards compatibility from 1.9? where indexes are introduced.

/Markus

Marc Tamlyn

unread,
Dec 1, 2014, 5:35:49 PM12/1/14
to django-d...@googlegroups.com
I intend the new indexes to have customisable names, and to deconstruct their name into the migration. This means that any changes in the name (if the name is autogenerated) would be detected. It should be possible to do renames.

It is worth noting that mysql (and sqlite obviously) do not support renaming of indexes and the index would need to be dropped and rebuilt. On large tables this would be a problem, so we'll need some warnings in the documentation about how to fix the index name so it is not identified as a change.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

tomv

unread,
Dec 1, 2014, 6:30:02 PM12/1/14
to django-d...@googlegroups.com
Thanks for clarifying that Mark. I've updated the ticket mentioning that we'll leave the resolution of the index name collision issue to your DEP.
Reply all
Reply to author
Forward
0 new messages