In the need for separating data migration from schema migration

114 views
Skip to first unread message

Augustin Riedinger

unread,
Apr 14, 2015, 11:49:33 AM4/14/15
to rubyonra...@googlegroups.com
Doing freelance with Rails, I keep facing an issue when I join an existing project: outdated migrations that fail dozens of times during 
rake db:migrate

To fix those, I need to do very hacky things, like commenting the failing lines, but also the already-migrated-schema ones which is very error-prone. Anyways, that's not the way it should be.

Should I use
rake schema:load
? That's not what I read as a good practice.

After some thinking, I came up with the simple idea of separating schema migrations from data migrations

Why? 
Because it's always data migration that fail.

Eg.
class AddStatusToUser < ActiveRecord::Migration
  def up
    add_column :users, :status, :string
    User.find_each do |user|
      user.status = 'active'
      user.save!
    end
  end

  def down
    remove_column :users, :status
  end
end
Then you'd remove the `User` class and the migration is broken.
After some search, I found this very nice article that summarized all the pain-points coming from not separating data migrations and schema migrations : http://railsguides.net/change-data-in-migrations-like-a-boss

Here, Andrey Koleshko suggests the following syntax:


class
CreateUsers < ActiveRecord::Migration def change # Database schema changes as usual end def data User.create!(name: 'Andrey') end end
I would suggest instead, something like creating two types of migrations : normal migrations, that remain unchanged, and data migrations, that would work the following way:
class CreateDefaultUser < ActiveRecord::DataMigration # Inherits from a different class: DataMigration
def up User.create!(name: 'Andrey') end
  def down
    User.find_by_name('Andrey').destroy
  end
end
I reckon it makes sense to put it in different files, as this is not the same type of database modification. 

And would be created calling
rails generate migration --data CreateDefaultUser

They would behave slightly differently: 
- Ignored when running something like:
 rake db:migrate --no-data # or
 rake db:setup --migrations

- or at least non-blocking (rescuing exception as a warning)

The cool thing about this is that there is no backward compability issue here.

You could say that's ok, there's already a gem that does that, those who need it can use it. But that's not true, as most developers would find out about the gem way too late, when all their broken migrations are already there. Plus if we call it a "best practice", it should not be optionnal. 

If it does make sense, I would definitely look at how to write such change.

Thanks for consideration.

Jason Fleetwood-Boldt

unread,
Apr 14, 2015, 12:06:37 PM4/14/15
to rubyonra...@googlegroups.com

Augustin-

I already solved this problem with a Gem I wrote. Instructions here: https://github.com/jasonfb/nondestructive_migrations

As for whether or not it should be in Rails core, I cannot speak to that. 

Generally speaking on large Rails apps most developers don’t run the schema migrations from the beginning of time (ever). This is true of most of the larger apps I’ve worked on. Unless you tirelessly work at always backporting fixes to old migrations, you will always get old migrations fail as Rails gets upgrades (simply from deprecations in syntax). You basically have either two options: (1) always work to keep your migrations current over the lifetime of the app (you really can’t be a freelancer for that— that’s something that needs to come from the team lead), or (2) forget about running the schema migrations from the dawn of time and just import your production data. 

Most larger apps eventually switch over to strategy #2 described above. 

In terms of your schema migration-data migration split— yes, the benefits you describe are inherent. Also, you could strategically choose to keep the schema migrations running from the beginning of time but not the data migrations. 

As well, I use data migrations to speed up deployment. In particular, running background data migrations while the app is live helps me reduce hours of downtime down to minutes. (My last enormous data migration took 11 hours).

-Jason





--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-co...@googlegroups.com.
To post to this group, send email to rubyonra...@googlegroups.com.
Visit this group at http://groups.google.com/group/rubyonrails-core.
For more options, visit https://groups.google.com/d/optout.

Pranas Kiziela

unread,
Apr 15, 2015, 5:07:50 AM4/15/15
to rubyonra...@googlegroups.com
Jason, how do you avoid the same data migration problems in your gem? If models you reference in a migration change, the migration would break, wouldn't it?

Augustin, I find it best to not reference any app models but rather define new classes for the models I need inside the migration. 

Jason Fleetwood-Boldt

unread,
Apr 15, 2015, 8:07:13 AM4/15/15
to rubyonra...@googlegroups.com
On Apr 15, 2015, at 5:07 AM, Pranas Kiziela <pranas....@gmail.com> wrote:

Jason, how do you avoid the same data migration problems in your gem? If models you reference in a migration change, the migration would break, wouldn't it?



I don't run the data migrations from the beginning of time -- I have no need to, I just dump Production data. Since the migrations have already been run on Production, they don't run again in the future (at the time of a changed model reference)

I do run the schema migrations from the beginning of time-- and I rarely if ever reference Model names in them. 



Augustin, I find it best to not reference any app models but rather define new classes for the models I need inside the migration. 


Also if you can write your migration in SQL it is typically many times faster than Ruby, so when I can I prefer writing the migration in raw SQL

So yes the problems you describe don't go away, I just use different strategies to deal with them. 

DHH

unread,
Apr 15, 2015, 8:09:48 AM4/15/15
to rubyonra...@googlegroups.com
The answer is in the default comment atop of schema.rb:

# Note that this schema.rb definition is the authoritative source for your
# database schema. If you need to create the application database on another
# system, you should be using db:schema:load, not running all the migrations
# from scratch. The latter is a flawed and unsustainable approach (the more migrations
# you'll amass, the slower it'll run and the greater likelihood for issues).

Migrations are meant to be temporary ways of shipping schema changes between developers and production. Not as a bootstrap method.

Radan Skoric

unread,
Apr 15, 2015, 11:07:35 AM4/15/15
to rubyonra...@googlegroups.com
You should be using rake db:setup which imports the schema and loads seeds. seeds are the place that is intended for initializing the database with data, not migrations. 

It is actually good to prune old migrations from time to time. For example, I like to use https://github.com/jalkoby/squasher to create a new "first"migration and then just delete all old migrations.

Rodrigo Rosenfeld Rosas

unread,
Apr 15, 2015, 1:48:47 PM4/15/15
to rubyonra...@googlegroups.com
Or if you, like us, need to use some specific data types not supported by schema.rb, and you don't need to support multiple database vendors, then you should use the structure.sql format instead. But DHH's explanation is correct, I'm just amending since schema.rb may not be suitable for everyone and some people might think that the only alternative would be to re-run all migrations. It's not.

Rodrigo.
--
Reply all
Reply to author
Forward
0 new messages