On 01/09/2016 09:29 AM, Shai Berger wrote:
> There are two kinds of data migrations, generally speaking.
> One is the kind that is needed to facilitate some schema migrations. Changing
> the type of some field, for example, usually involves creating the new field
> (schema migration), transforming the data from the old field to the new (data),
> and then removing the old field (and perhaps some renaming; schema). This kind
> of migrations, indeed, can be just removed when squashing.
> The other is migrations which *create* data -- fill in tables for database-
> implemented-enums, for example. If you remove these, you are going to break
> your tests (if you do this and haven't broken your tests, your tests are
> missing -- I'd go as far as calling them broken).
I consider this second kind of data migration to be marginally smelly,
and avoid it whenever possible. If an enum's possible values will change
infrequently, I'd much rather have it in code than in the database
(that's almost certainly more efficient, too). If its values will change
frequently, then it's less likely that I have a strong "default set",
and I'd be more likely to leave the creation of initial values to be an
explicit part of deployment setup (and have the tests also explicitly
create the values they need), rather than use a data migration to
enforce a particular initial set.
That said, I do recognize that people use data migrations to create
initial data, and that there are cases where it makes sense.
> The second kind is quite common. Having a built-in command that resets
> migrations and ignores them is, IMO, encouragement to skip testing, and I
> think we shouldn't do that.
I agree that a "resetmigrations" command is dangerous. For my own usage
I'd be more worried about losing custom SQL schema or indexes added via
RunSQL than I would about initial-data migrations, but either way it's
certainly potentially lossy. I'd only be -0 on adding it, but it would
need to come with strong warnings in the documentation.
> Some notion (as Carl, I think, mentioned) of
> "squashable data migrations" -- essentially, telling the two kinds apart --
> would be helpful. but not solve the problem completely, because, ultimately,
> the second kind exists. We need to figure out how to help users deal with them.
Hmm, I'm not sure if there is any general way to "help users deal with"
non-squashable RunPython/RunSQL migrations -- did you have any
particular ideas in mind? If they aren't squashable, they aren't
squashable, and I don't see what Django can do about that that doesn't
require the user to think carefully about their actual migration set and
manually modify migrations. You can't even do something like move
initial-data migrations to the end of the set to allow optimizing away
more schema alterations, because the migration may create the initial
data in a way that only works at that particular moment in schema history.
I think we'll be doing pretty well (it would certainly make a big
difference for my projects) if we can just allow users to identify
migrations that can be ignored when squashing.